Homogentisate prenyl transferase (&#34;HPT&#34;) nucleic acids and polypeptides, and uses thereof

ABSTRACT

The present invention is in the field of plant genetics and biochemistry. More specifically, the present invention relates to genes and polypeptides associated with the tocopherol biosynthesis pathway, namely those encoding homogentisate prenyl transferase activity, and uses thereof.

This application claims priority to U.S. No. 60/365,202 filed Mar. 19,2002, the disclosure of which is incorporated herein by reference in itsentirety.

The present invention is in the field of plant genetics andbiochemistry. More specifically, the present invention relates to genesand polypeptides associated with the tocopherol biosynthesis pathway,namely those encoding homogentisate prenyl transferase activity, anduses thereof.

Isoprenoids are ubiquitous compounds found in all living organisms.Plants synthesize a diverse array of greater than 22,000 isoprenoids(Connolly and Hill, Dictionary of Terpenoids, Chapman and Hall, NewYork, N.Y. (1992)). In plants, isoprenoids play essential roles inparticular cell functions such as production of sterols, contributing toeukaryotic membrane architecture, acyclic polyprenoids found in the sidechain of ubiquinone and plastoquinone, growth regulators like abscisicacid, gibberellins, brassinosteroids or the photosynthetic pigmentschlorophylls and carotenoids. Although the physiological role of otherplant isoprenoids is less evident, like that of the vast array ofsecondary metabolites, some are known to play key roles mediating theadaptative responses to different environmental challenges. In spite ofthe remarkable diversity of structure and function, all isoprenoidsoriginate from a single metabolic precursor, isopentenyl diphosphate(IPP) (Wright, (1961) Annu. Rev. Biochem., 20:525-548; and Spurgeon andPorter, In: Biosynthesis of Isoprenoid Compounds, Porter and Spurgeon(eds.) John Wiley, NY, Vol. 1, pp. 1-46 (1981)).

A number of unique and interconnected biochemical pathways derived fromthe isoprenoid pathway leading to secondary metabolites, includingtocopherols, exist in chloroplasts of higher plants. Tocopherols notonly perform vital functions in plants, but are also important frommammalian nutritional perspectives. In plastids, tocopherols account forup to 40% of the total quinone pool. Tocopherols are an importantcomponent of mammalian diets. Epidemiological evidence indicates thattocopherol supplementation can result in decreased risk forcardiovascular disease and cancer, can aid in immune function, and isassociated with prevention or retardation of a number of degenerativedisease processes in humans (Traber and Sies, Annu. Rev. Nutr.,16:321-347 (1996)). Tocopherol functions, in part, by stabilizing thelipid bilayer of biological membranes (Skrypin and Kagan, Biochim.Biophys. Acta, 815:209 (1995); Kagan, N.Y. Acad. Sci., p. 121 (1989);Gomez-Fernandez et al., Ann. N. Y. Acad. Sci., p. 109 (1989)), reducingpolyunsaturated fatty acid (PUFA) free radicals generated by lipidoxidation (Fukuzawa et al., Lipids, 17:511-513 (1982)), and scavengingoxygen free radicals, lipid peroxy radicals and singlet oxygen species(Diplock et al., Ann. N Y Acad. Sci., 570:72 (1989); Fryer, Plant CellEnviron., 15(4):381-392 (1992)).

The compound α-tocopherol, which is often referred to as vitamin E,belongs to a class of lipid-soluble antioxidants that includes α, β, γ,and δ-tocopherols and α, β, γ, and δ-tocotrienols. Although α, β, γ, andδ-tocopherols and α, β, γ, and δ-tocotrienols are sometimes referred tocollectively as “vitamin E”, vitamin E is more appropriately definedchemically as α-tocopherol. Vitamin E, or α-tocopherol, is significantfor human health, in part because it is readily absorbed and retained bythe body, and therefore has a higher degree of bioactivity than othertocopherol species (Traber and Sies, Annu. Rev. Nutr., 16:321-347(1996)). Other tocopherols, however, such as β, γ, and δ-tocopherolsalso have significant health and nutritional benefits.

Tocopherols are primarily synthesized only by plants and certain otherphotosynthetic organisms, including cyanobacteria. As a result,mammalian dietary tocopherols are obtained almost exclusively from thesesources. Plant tissues vary considerably in total tocopherol content andtocopherol composition, with α-tocopherol the predominant tocopherolspecies found in green, photosynthetic plant tissues. Leaf tissue cancontain from 10-50 μg of total tocopherols per gram fresh weight, butmost of the world's major staple crops (e.g., rice, corn, wheat, potato)produce low to extremely low levels of total tocopherols, of which onlya small percentage is α-tocopherol (Hess, Vitamin E, α-tocopherol, In:Antioxidants in Higher Plants, R. Alscher and J. Hess, (eds.), CRCPress, Boca Raton., pp. 111-134 (1993)). Oil seed crops generallycontain much higher levels of total tocopherols, but α-tocopherol ispresent only as a minor component in most oilseeds (Taylor and Barnes,Chemy Ind., October:722-726 (1981)).

The recommended daily dietary intake of 15-30 mg of vitamin E is quitedifficult to achieve from the average American diet. For example, itwould take over 750 grams of spinach leaves, in which α-tocopherolcomprises 60% of total tocopherols, or 200-400 grams of soybean oil tosatisfy this recommended daily vitamin E intake. While it is possible toaugment the diet with supplements, most of these supplements containprimarily synthetic vitamin E, having eight stereoisomers, whereasnatural vitamin E is predominantly composed of only a single isomer.Furthermore, supplements tend to be relatively expensive, and thegeneral population is disinclined to take vitamin supplements on aregular basis. Therefore, there is a need in the art for compositionsand methods that either increase the total tocopherol production orincrease the relative percentage of α-tocopherol produced by plants.

In addition to the health benefits of tocopherols, increasedα-tocopherol levels in crops have been associated with enhancedstability and extended shelf life of plant products (Peterson,Cereal-Chem., 72(1):21-24 (1995); Ball, Fat-soluble vitamin assays infood analysis. A comprehensive review, London, Elsevier SciencePublishers Ltd. (1988)). Further, tocopherol supplementation of swine,beef, and poultry feeds has been shown to significantly increase meatquality and extend the shelf life of post-processed meat products byretarding post-processing lipid oxidation, which contributes to theundesirable flavor components (Sante and Lacourt, J. Sci. Food Agric.,65(4):503-507 (1994); Buckley et al., J. of Animal Science, 73:3122-3130(1995)).

Tocopherol Biosynthesis

The plastids of higher plants exhibit interconnected biochemicalpathways leading to secondary metabolites including tocopherols. Thetocopherol biosynthetic pathway in higher plants involves condensationof homogentisic acid and phytylpyrophosphate to form2-methylphytylplastoquinol (Fiedler et al., Planta, 155:511-515 (1982);Soll et al., Arch. Biochem. Biophys., 204:544-550 (1980); Marshall etal., Phytochem., 24:1705-1711 (1985)). This plant tocopherol pathway canbe divided into four parts: 1) synthesis of homogentisic acid (HGA),which contributes to the aromatic ring of tocopherol; 2) synthesis ofphytylpyrophosphate, which contributes to the side chain of tocopherol;3) joining of HGA and phytylpyrophosphate via a homogentisate prenyltransferase followed by a subsequent cyclization; and 4) S-adenosylmethionine dependent methylation of an aromatic ring, which affects therelative abundance of each of the tocopherol species. See FIG. 1.

Various genes and their encoded proteins that are involved in tocopherolbiosynthesis include those listed in the table below: Gene ID Enzymename tyrA Bifunctional prephenate dehydrogenase slr1736 Homogentisateprenyl transferase from Synechocystis ATPT2 Homogentisate prenyltransferase from Arabidopsis thaliana DXS 1-Deoxyxylulose-5-phosphatesynthase DXR 1-Deoxyxylulose-5-phosphate reductoisomerase GGPPSGeranylgeranyl pyrophosphate synthase HPPD p-Hydroxyphenylpyruvatedioxygenase AANT1 Adenylate transporter slr1737 Tocopherol cyclase IDIIsopentenyl diphosphate isomerase GGH Geranylgeranyl diphosphatereductase GMT Gamma Methyl Transferase tMT2 Tocopherol methyltransferase 2 MT1 Methyl transferase 1 gcpE(E)-4-hydroxy-3-methylbut-2-enyl diphosphate synthase

The “Gene IDs” given in the table above identify the gene associatedwith the listed enzyme. Any of the Gene IDs listed in the tableappearing herein in the present disclosure refer to the gene encodingthe enzyme with which the Gene ID is associated in the table.

As used herein, HPT, HPT2, PPT, slr1736, and ATPT2 each refer toproteins or genes encoding proteins that have the same activity.

Synthesis of Homogentisic Acid

Homogentisic acid is the common precursor to both tocopherols andplastoquinones. In at least some bacteria the synthesis of homogentisicacid is reported to occur via the conversion of chorismate to prephenateand then to p-hydroxyphenylpyruvate via a bifunctional prephenatedehydrogenase. Examples of bifunctional bacterial prephenatedehydrogenase enzymes include the proteins encoded by the tyrA genes ofErwinia herbicola and Escherichia coli. The tyrA gene product catalyzesthe production of prephenate from chorismate, as well as the subsequentdehydrogenation of prephenate to form p-hydroxyphenylpyruvate (p-HPP),the immediate precursor to homogentisic acid. p-HPP is then converted tohomogentisic acid by hydroxyphenylpyruvate dioxygenase (HPPD). Incontrast, plants are believed to lack prephenate dehydrogenase activity,and it is generally believed that the synthesis of homogentisic acidfrom chorismate occurs via the synthesis and conversion of theintermediate arogenate. Since pathways involved in homogentisic acidsynthesis are also responsible for tyrosine formation, any alterationsin these pathways can also result in the alteration in tyrosinesynthesis and the synthesis of other aromatic amino acids.

Synthesis of Phytylpyrophosphate

Tocopherols are a member of the class of compounds referred to as theisoprenoids. Other isoprenoids include carotenoids, gibberellins,terpenes, chlorophyll and abscisic acid. A central intermediate in theproduction of isoprenoids is isopentenyl diphosphate (IPP). Cytoplasmicand plastid-based pathways to generate IPP have been reported. Thecytoplasmic based pathway involves the enzymes acetoacetyl CoA thiolase,HMGCoA synthase, HMGCoA reductase, mevalonate kinase, phosphomevalonatekinase, and mevalonate pyrophosphate decarboxylase.

Recently, evidence for the existence of an alternative, plastid based,isoprenoid biosynthetic pathway emerged from studies in the researchgroups of Rohmer and Arigoni (Eisenreich et al., Chem. Bio., 5:R221-R233(1998); Rohmer, Prog. Drug. Res., 50:135-154 (1998); Rohmer,Comprehensive Natural Products Chemistry, Vol. 2, pp. 45-68, Barton andNakanishi (eds.), Pergamon Press, Oxford, England (1999)), who foundthat the isotope labeling patterns observed in studies on certaineubacterial and plant terpenoids could not be explained in terms of themevalonate pathway. Arigoni and coworkers subsequently showed that1-deoxyxylulose, or a derivative thereof, serves as an intermediate ofthe novel pathway, now referred to as the MEP pathway (Rohmer et al.,Biochem. J., 295:517-524 (1993); Schwarz, Ph.D. thesis, EidgenössicheTechnische Hochschule, Zurich, Switzerland (1994)). Recent studiesshowed the formation of 1-deoxyxylulose 5-phosphate (Broers, Ph.D.thesis, Eidgenössiche Technische Hochschule, Zurich, Switzerland (1994))from one molecule each of glyceraldehyde 3-phosphate (Rohmer,Comprehensive Natural Products Chemistry, Vol. 2, pp. 45-68, Barton andNakanishi (eds.), Pergamon Press, Oxford, England (1999)) and pyruvate(Eisenreich et al., Chem. Biol., 5:R223-R233 (1998); Schwarz supra;Rohmer et al., J. Am. Chem. Soc., 118:2564-2566 (1996); and Sprenger etal., Proc. Natl. Acad. Sci. (U.S.A.), 94:12857-12862 (1997)) by anenzyme encoded by the dxs gene (Lois et al., Proc. Natl. Acad. Sci.(U.S.A.), 95:2105-2110 (1997); and Lange et al., Proc. Natl. Acad. Sci.(U.S.A.), 95:2100-2104 (1998)). 1-Deoxyxylulose 5-phosphate can befurther converted into 2-C-methylerythritol 4-phosphate (Arigoni et al.,Proc. Natl. Acad. Sci. (U.S.A.), 94:10600-10605 (1997)) by areductoisomerase encoded by the dxr gene (Bouvier et al., Plant Physiol,117:1421-1431 (1998); and Rohdich et al., Proc. Natl. Acad. Sci.(U.S.A.), 96:11758-11763 (1999)).

Reported genes in the MEP pathway also include ygbP, which catalyzes theconversion of 2-C-methylerythritol 4-phosphate into its respectivecytidyl pyrophosphate derivative and ygbB, which catalyzes theconversion of 4-phosphocytidyl-2-C-methyl-D-erythritol into2-C-methyl-D-erythritol, 3,4-cyclophosphate. These genes are tightlylinked on the E. coli genome (Herz et al., Proc. Natl. Acad. Sci.(U.S.A.), 97(6):2485-2490 (2000)).

Once IPP is formed by the MEP pathway, it is converted to GGDP by GGPDPsynthase, and then to phytylpyrophosphate, which is the centralconstituent of the tocopherol side chain.

Combination and Cyclization

Homogentisic acid is combined with either phytyl-pyrophosphate orsolanyl-pyrophosphate by homogentisate prenyl transferase (HPT) forming2-methylphytyl plastoquinol or 2-methylsolanyl plastoquinol,respectively. 2-methylsolanyl plastoquinol is a precursor to thebiosynthesis of plastoquinones, while 2-methylphytyl plastoquinol isultimately converted to tocopherol.

Methylation of the Aromatic Ring

The major structural difference between each of the tocopherol subtypesis the position of the methyl groups around the phenyl ring. Both2-methylphytyl plastoquinol and 2-methylsolanyl plastoquinol serve assubstrates for the plant enzyme2-methylphytylplastoquinol/2-methylsolanylplastoquinol methyltransferase(Tocopherol Methyl Transferase 2; Methyl Transferase 2; MT2; tMT2),which is capable of methylating a tocopherol precursor. Subsequentmethylation at the 5 position of γ-tocopherol by γ-tocopherolmethyl-transferase (GMT) generates the biologically active α-tocopherol.

Some plants e.g. soy produce substantial amounts of delta andsubsequently beta-tocopherol in their seed. The formation ofδ-tocopherol or β-tocopherol can be prevented by the overexpression oftMT2, resulting in the methylation of the δ-tocopherol precursor,2-methyl phytyl plastoquinone to form 2,3-dimethyl-5-phytylplastoquinone followed by cyclization with tocopherol cyclase to formγ-tocopherol and a subsequent methylation by GMT to form α-tocopherol.In a possible alternative pathway, β-tocopherol is directly converted toα-tocopherol by tMT2 via the methylation of the 3 position (see, forexample, Biochemical Society Transactions, 11:504-510 (1983);Introduction to Plant Biochemistry, 2^(nd) edition, Chapter 11 (1983);Vitamin Hormone, 29:153-200 (1971); Biochemical Journal, 109:577 (1968);and, Biochemical and Biophysical Research Communication, 28(3):295(1967)). Since all potential mechanisms for the generation ofα-tocopherol involve catalysis by tMT2, plants that are deficient inthis activity accumulate δ-tocopherol and β-tocopherol. Plants whichhave increased tMT2 activity tend to accumulate γ-tocopherol andα-tocopherol. Since there is limited GMT activity in the seeds of manyplants, these plants tend to accumulate γ-tocopherol.

There is a need in the art for nucleic acid molecules encoding enzymesinvolved in tocopherol biosysnthesis, as well as related enzymes andantibodies for the enhancement or alteration of tocopherol production inplants. There is a further need for transgenic organisms expressingthose nucleic acid molecules involved in tocopherol biosynthesis, whichare capable of nutritionally enhancing food and feed sources.

SUMMARY OF THE INVENTION

The present invention includes and provides a substantially purifiednucleic acid molecule encoding an amino acid sequence selected from thegroup consisting of SEQ ID NOs: 5, 9-11, 57-58, and 90.

The present invention includes and provides a substantially purifiedpolypeptide molecule comprising an amino acid sequence selected from thegroup consisting of SEQ ID NOs: 5, 9-11, 57-58, and 90.

The present invention includes and provides an antibody capable ofspecifically binding a polypeptide comprising an amino acid sequenceselected from the group consisting of SEQ ID NOs: 5, 9-11, 57-58, and90.

The present invention includes and provides a substantially purifiednucleic acid molecule encoding a polypeptide having homogentisate prenyltransferase activity comprising an amino acid sequence selected from thegroup consisting of SEQ ID NOs: 43 and 44.

The present invention includes and provides a substantially purifiedpolypeptide having homogentisate prenyl transferase activity comprisingan amino acid sequence selected from the group consisting of SEQ ID NOs:43 and 44.

The present invention includes and provides a transformed plantcomprising an introduced nucleic acid molecule encoding a polypeptidecomprising an amino acid sequence selected from the group consisting ofSEQ ID NOs: 5, 9-11, 43-44, 57-58, and 90, and complements thereof.

The present invention includes and provides a transformed plantcomprising an introduced first nucleic acid molecule encoding apolypeptide comprising an amino acid sequence selected from the groupconsisting of SEQ ID NOs: 5, 9-11, 43-44, 57-58, and 90, and complementsthereof, and an introduced second nucleic acid molecule encoding anenzyme selected from the group consisting of tyrA, prephenatedehyrogenase, tocopherol cyclase, dxs, dxr, GMT, MT1, tMT2, GCPE, GGPPS,HPPD, AANT1, IDI, GGH, and complements thereof.

The present invention includes and provides a transformed plantcomprising a nucleic acid molecule comprising an introduced promoterregion which functions in plant cells to cause the production of an mRNAmolecule, wherein said introduced promoter region is linked to atranscribed nucleic acid molecule having a transcribed strand and anon-transcribed strand, wherein said transcribed strand is complementaryto a nucleic acid molecule encoding a polypeptide selected from thegroup consisting of SEQ ID NOs: 5, 9-11, 43-44, 57-58, and 90, andwherein said transcribed nucleic acid molecule is linked to a 3′non-translated sequence that functions in the plant cells to causetermination of transcription and addition of polyadenylatedribonucleotides to a 3′ end of the mRNA sequence.

The present invention includes and provides a method of producing aplant having a seed with an increased total tocopherol level comprising:(A) transforming said plant with an introduced nucleic acid moleculeencoding a polypeptide comprising an amino acid sequence selected fromthe group consisting of SEQ ID NOs: 5, 9-11, 43-44, 57-58, and 90; and(B) growing said transformed plant.

The present invention includes and provides a method of producing aplant having a seed with an increased total tocopherol level comprising:(A) transforming said plant with an introduced first nucleic acidmolecule, wherein said first nucleic acid molecule encodes a polypeptidehaving an amino acid sequence selected from the group consisting of SEQID NOs: 5, 9-11, 43-44, 57-58, and 90, and an introduced second nucleicacid molecule encoding an enzyme selected from the group consisting oftyrA, prephenate dehydrogenase, tocopherol cyclase, dxs, dxr, GMT, MT1,tMT2, GGPPS, GCPE, HPPD, AANT1, IDI, GGH, and complements thereof; and(B) growing said transformed plant.

The present invention includes and provides a seed derived from atransformed plant comprising an introduced nucleic acid moleculeencoding a polypeptide comprising an amino acid sequence selected fromthe group consisting of SEQ ID NOs: 5, 9-11, 43-44, 57-58, and 90.

The present invention includes and provides a seed derived from atransformed plant comprising an introduced first nucleic acid moleculeencoding an introduced polypeptide comprising an amino acid sequenceselected from the group consisting of SEQ ID NOs: 5, 9-11, 43, 44, 57,58, and 90 and an introduced second nucleic acid encoding an enzymeselected from the group consisting of tyrA, prephenate dehydrogenase,tocopherol cyclase, dxs, dxr, GMT, MT1, GCPE, tMT2, GGPPS, HPPD, AANT1,IDI, GGH, and complements thereof.

The present invention includes and provides a substantially purifiedpolypeptide comprising an amino acid sequence selected from the groupconsisting of SEQ ID NOs: 39-42, 46-49, and 92-95 wherein said aminoacid sequence is not derived from a nucleic acid molecule that isderived from Nostoc punctiforme, Anabaena, Synechocystis, Zea mays,Glycine max, Arabidopsis thaliana, Oryza sativa, Trichodesmiumerythraeum, Chloroflexus aurantiacus, wheat, leek, canola, cotton, ortomato. The present invention includes and provides said substantiallypurified polypeptide wherein more than one more amino acid sequence isselected from the group consisting of SEQ ID NOs: 39-42, 46-49, and92-95.

The present invention includes and provides a substantially purifiednucleic acid molecule encoding a polypeptide comprising an amino acidsequence selected from the group consisting of SEQ ID NOs: 39-42, 46-49,and 92-95 wherein said nucleic acid molecule is not derived from Nostocpunctiforme, Anabaena, Synechocystis, Zea mays, Glycine max, Arabidopsisthaliana, Oryza sativa, Trichodesmium erythraeum, Chloroflexusaurantiacus, wheat, leek, canola, cotton, or tomato. The presentinvention includes and provides said nucleic acid molecule wherein thepolypeptide further comprises more than one amino acid sequence selectedfrom the group consisting of SEQ ID NOs: 39-42, 46-49, and 92-95.

The present invention includes and provides a substantially purifiednucleic acid molecule encoding a polypeptide comprising an amino acidsequence selected from the group consisting of SEQ ID NOs: 39-42, 46-49,and 92-95 wherein said nucleic acid molecule is not derived from Nostocpunctiforme, Anabaena, Synechocystis, Zea mays, Glycine max, Arabidopsisthaliana, Oryza sativa, Sulfolobus, Aeropyum, Trichodesmium erythraeum,Chloroflexus aurantiacus, sorghum, wheat, tomato, or leek. The presentinvention includes and provides said nucleic acid molecule wherein thepolypeptide further comprises more than one amino acid sequence selectedfrom the group consisting of SEQ ID NOs: 39-42, 46-49 and 92-95.

The present invention includes and provides a plant transformed with anucleic acid molecule encoding a polypeptide comprising an amino acidsequence selected from the group consisting of SEQ ID NOs: 39-42, 46-49,and 92-95 wherein said nucleic acid molecule is not derived from Nostocpunctiforme, Anabaena, Synechocystis, Zea mays, Glycine max, Arabidopsisthaliana, Oryza sativa, Sulfolobus, Aeropyum, Trichodesmium erythraeum,Chloroflexus aurantiacus, sorghum, wheat, tomato, or leek. The presentinvention includes and provides said nucleic acid molecule wherein thepolypeptide further comprises more than one amino acid sequence selectedfrom the group consisting of SEQ ID NOs: 39-42, 46-49, and 92-95.

The present invention includes and provides a substantially purifiedpolypeptide comprising an amino acid sequence selected from the groupconsisting of SEQ ID NOs: 39-42, 46-49, and 92-95 wherein saidpolypeptide does not comprise any of the amino acid sequences set forthin sequence listings in WO 00/68393 (which sequences are incorporatedherein by reference); WO 00/63391 (which sequences are incorporatedherein by reference); WO 01/62781 (which sequences are incorporatedherein by reference); or WO 02/33060 (which sequences are incorporatedherein by reference); and does not comprise SEQ ID NOs: 1-11, 43-45,57-58, 61-62, or 90 from the present application.

The present invention includes and provides a substantially purifiedpolypeptide comprising more than onean amino acid sequence selected fromthe group consisting of SEQ ID NOs: 39-42, 46-49, and 92-95.

The present invention includes and provides a substantially purifiednucleic acid molecule encoding a polypeptide comprising an amino acidsequence selected from the group consisting of SEQ ID NOs: 39-42, 46-49,and 92-95 wherein said nucleic acid molecule does not comprise any ofthe nucleic acid sequences set forth in sequence listings in WO00/68393; WO 00/63391; WO 01/62781; or WO 02/33060; and does notcomprise SEQ ID NOs: 27-36, 59-60, 88-89, and 91 from the presentapplication, or the gene with Genebank Accession Nos. AI 897027 or AW563431 The present invention includes and provides said nucleic acidmolecule wherein the polypeptide further comprises more than one aminoacid sequence selected from the group consisting of SEQ ID NOs: 39-42,46-49, and 92-95.

The present invention includes and provides a plant transformed with anucleic acid molecule encoding a polypeptide comprising an amino acidsequence selected from the group consisting of SEQ ID NOs: 39-42, 46-49,and 92-95 wherein said nucleic acid molecule does not comprise any ofthe nucleic acid sequences set forth in sequence listings in WO00/68393; WO 00/63391; WO 01/62781; or WO 02/33060; and does notcomprise SEQ ID NOs: 27-36; 59-60, 88-89, and 91 from the presentapplication, or the gene with Genebank Accession Nos. AI 897027 or AW563431. The present invention includes and provides said nucleic acidmolecule wherein the polypeptide further comprises more than one aminoacid sequence selected from the group consisting of SEQ ID NOs: 39-42,46-49, and 92-95.

The present invention includes and provides a substantially purifiednucleic acid molecule comprising a nucleic acid sequence selected fromthe group consisting of SEQ ID NOs: 31, 34-36, 59-60, and 91.

The present invention includes and provides for homogentisate prenyltransferases discovered using one or more of the alignments of FIGS. 2a- 2 c, 3 a-3 c, 24 a-24 b, 25 a-25 b, 33 a-33 c, 34 a-34 b, 35 a-35 band 36.

Description of the Nucleic and Amino Acid Sequences

SEQ ID NO: 1 sets forth a Nostoc punctiforme homogentisate prenyltransferase polypeptide.

SEQ ID NO: 2 sets forth an Anabaena homogentisate prenyl transferasepolypeptide.

SEQ ID NO: 3 sets forth a Synechocystis homogentisate prenyl transferasepolypeptide.

SEQ ID NO: 4 sets forth a Zea mays homogentisate prenyl transferasepolypeptide (HPT1).

SEQ ID NO: 5 sets forth a Glycine max homogentisate prenyl transferasepolypeptide (HPT1-2).

SEQ ID NO: 6 sets forth a Glycine max homogentisate prenyl transferasepolypeptide (HPT1-1).

SEQ ID NO: 7 sets forth an Arabidopsis thaliana homogentisate prenyltransferase polypeptide (HPT1).

SEQ ID NO: 8 sets forth a partial Cuphea pulcherrima homogentisateprenyl transferase polypeptide.

SEQ ID NO: 9 sets forth a leek homogentisate prenyl transferasepolypeptide (HPT1).

SEQ ID NO: 10 sets forth a wheat homogentisate prenyl transferasepolypeptide (HPT1).

SEQ ID NO: 11 sets forth a Cuphea pulcherrima homogentisate prenyltransferase polypeptide (HPT1).

SEQ ID NOs: 12-15 represent domains from SEQ ID NOs: 1-8.

SEQ ID NOs: 16-26 set forth primer sequences.

SEQ ID NO: 27 sets forth a nucleic acid molecule encoding a Nostocpunctiforme homogentisate prenyl transferase polypeptide.

SEQ ID NO: 28 sets forth a nucleic acid molecule encoding an Anabaenahomogentisate prenyl transferase polypeptide.

SEQ ID NO: 29 sets forth a nucleic acid molecule encoding aSynechocystis homogentisate prenyl transferase polypeptide.

SEQ ID NO: 30 sets forth a nucleic acid molecule encoding a Zea mayshomogentisate prenyl transferase polypeptide (HPT1).

SEQ ID NO: 31 sets forth a nucleic acid molecule encoding a Glycine maxhomogentisate prenyl transferase polypeptide (HPT1-2).

SEQ ID NO: 32 sets forth a nucleic acid molecule encoding a Glycine maxhomogentisate prenyl transferase polypeptide (HPT1-1).

SEQ ID NO: 33 sets forth a nucleic acid molecule encoding an Arabidopsisthaliana homogentisate prenyl transferase polypeptide (HPT1).

SEQ ID NO: 34 sets forth a nucleic acid molecule encoding a Cupheapulcherrima homogentisate prenyl transferase polypeptide (HPT1).

SEQ ID NO: 35 sets forth a nucleic acid molecule encoding a leekhomogentisate prenyl transferase polypeptide (HPT1).

SEQ ID NO: 36 sets forth a nucleic acid molecule encoding a wheathomogentisate prenyl transferase polypeptide (HPM1).

SEQ ID NOs: 37-38 set forth primer sequences.

SEQ ID NOs: 39-42 set forth domains from SEQ ID NOs: 1-7 and 9-11.

SEQ ID NO: 43 sets forth a homogentisate prenyl transferase polypeptidefrom Trichodesmium erythraeum.

SEQ ID NO: 44 sets forth a homogentisate prenyl transferase polypeptidefrom Chloroflexus aurantiacus.

SEQ ID NO: 45 sets forth a putative sequence for an Arabidopsis thalianahomogentisate prenyl transferase polypeptide (HPT2).

SEQ ID NOs: 46-49 represent domains from SEQ ID NOs: 1-4, 6-7, 9-11,57-58 and 91.

SEQ ID NOs: 50-56 set forth primer sequences.

SEQ ID NO: 57 sets forth an Arabidopsis thaliana homogentisate prenyltransferase polypeptide (HPT2).

SEQ ID NO: 58 sets forth an Oryza sativa homogentisate prenyltransferase polypeptide (HPT2).

SEQ ID NO: 59 sets forth a nucleic acid molecule encoding an Arabidopsisthaliana homogentisate prenyl transferase polypeptide (HPT2).

SEQ ID NO: 60 sets forth a nucleic acid molecule encoding an Oryzasativa homogentisate prenyl transferase polypeptide (HPT2).

SEQ ID NO: 61 sets forth a putative homogentisate prenyl transferasepolypeptide from Arabidopsis thaliana (HPT2).

SEQ ID NO: 62 sets forth a putative homogentisate prenyl transferasepolypeptide from Arabidopsis thaliana (HPT2).

SEQ ID NO: 63 sets forth an EST from Arabidopsis thaliana.

SEQ ID NO: 64 sets forth an EST from Medicago truncatula.

SEQ ID NO: 65 sets forth an EST from Medicago truncatula developingstem.

SEQ ID NO: 66 sets forth an EST from Medicago truncatula developingstem.

SEQ ID NO: 67 sets forth an EST from Medicago truncatula developingstem.

SEQ ID NO: 68 sets forth an EST from mixed potato tissues.

SEQ ID NO: 69 sets forth an EST from Arabidopsis thaliana, Columbiaecotype flower buds.

SEQ ID NO: 70 sets forth an EST from Arabidopsis thaliana.

SEQ ID NO: 71 sets forth an EST from Medicago truncatula.

SEQ ID NO: 72 sets forth an EST from Glycine max.

SEQ ID NOs: 73-83 and 84-87 set forth primer sequences.

SEQ ID NO: 88 sets forth a nucleic acid molecule encoding ahomogentisate prenyl transferase polypeptide from cyanobacteriaTrichodesmium erythraeum.

SEQ ID NO: 89 sets forth a nucleic acid molecule encoding ahomogentisate prenyl transferase polypeptide from photobacteriaChloroflexus aurantiacus.

SEQ ID NO: 90 sets forth a Glycine max homogentisate prenyl transferasepolypeptide (HPT2).

SEQ ID NO: 91 sets forth a nucleic acid molecule encoding ahomogentisate prenyl transferase polypeptide from Glycine max (HPT2).

SEQ ID NOs: 92-95 represent domains from SEQ ID NOs: 1-4, 6-7, 9-11,43-44, 57-58, and 90.

Note: cyanobacteria and photobbacteria have one HPT. Plants have bothHPT1 and HPT2. In soy, there are two variations of HPT1, HPT1-1 andHPT1-2, as well as HPT2.

BRIEF DESCRIPTION OF THE FIGURES

FIG. 1 is a schematic diagram of the tocopherol biosynthetic pathway.

FIGS. 2 a-2 c depicts a sequence alignment for several homogentisateprenyl transferase polypeptides SEQ ID NOs: 1-8).

FIGS. 3 a-3 c depicts a sequence alignment for several homogentisateprenyl transferase polypeptides (SEQ ID NOs: 1-7, and 9-11).

FIG. 4 provides a schematic of the expression construct pCGN10800.

FIG. 5 provides a schematic of the expression construct pCGN10801.

FIG. 6 provides a schematic of the expression construct pCGN10803.

FIG. 7 provides a schematic of the expression construct pCGN10822.

FIG. 8 provides bar graphs of HPLC data obtained from seed extracts oftransgenic Arabidopsis containing pCGN10822, which provides of theexpression of the ATPT2 sequence (SEQ ID NO: 33), in the senseorientation, from the napin promoter. Provided are graphs for α, γ, andδ-tocopherols, as well as total tocopherol for 22 transformed lines, aswell as a nontransformed (wild-type) control.

FIG. 9 provides a bar graph of HPLC analysis of seed extracts fromArabidopsis plants transformed with a pCGN10803 (lines 1387 through1624, enhanced 35S-ATPT2, in the antisense orientation), anontransformed (wt) control, and an empty vector transformed control.

FIG. 10 provides a schematic of the expression construct pMON36581.

FIG. 11 provides a schematic of the expression construct pMON69933.

FIG. 12 provides a schematic of the expression construct pMON69924.

FIG. 13 provides a schematic of the expression construct pMON69943.

FIG. 14 provides a bar graph of total tocopherol levels in recombinantsoy lines.

FIG. 15 depicts pMON 69960.

FIG. 16 depicts pMON 36525.

FIG. 17 depicts pMON 69963.

FIG. 18 depicts pMON 69965.

FIG. 19 depicts pMON 10098.

FIG. 20 depicts pMON 69964.

FIG. 21 depicts pMON 69966.

FIG. 22 depicts results of seed total tocopherol analysis.

FIG. 23 depicts results of seed total tocopherol analysis.

FIG. 24 depicts the alignments of SEQ ID NOs: 1-4, 6-7, 9-11, 57, and90.

FIG. 25 depicts motifs V through VIII, SEQ ID NOs: 46-49.

FIG. 26 depicts a sequence tree derived from a multiple alignment shownfrom SEQ ID NOs: 1-7, 9-11, 43, 44, 57-58, and 90.

FIG. 27 depicts pMON81028.

FIG. 28 depicts pMON81023.

FIG. 29 depicts pMON36596.

FIG. 30 depicts pET30a(+) vector.

FIG. 31 depicts pMON69993.

FIG. 32 depicts pMON69992.

FIGS. 33 a-33 c depicts a sequence alignment for several homogentisateprenyl transferase polypeptide SEQ ID NOs: 1-4, 6-7, 9-11, 43-44, 57-58,and 90.

FIG. 34 depicts motifs IX through XII, SEQ ID NOs: 92-95.

FIG. 35 depicts motifs I-IV, SEQ ID NOs: 39-42.

FIG. 36 depicts motifs A-D.

DETAILED DESCRIPTION

The present invention provides a number of agents, for example, nucleicacid molecules and polypeptides associated with the synthesis oftocopherol, and provides uses of such agents.

Agents

The agents of the present invention will preferably be “biologicallyactive” with respect to either a structural attribute, such as thecapacity of a nucleic acid to hybridize to another nucleic acidmolecule, or the ability of a protein to be bound by an antibody (or tocompete with another molecule for such binding). Alternatively, such anattribute may be catalytic and thus involve the capacity of the agent tomediate a chemical reaction or response. The agents will preferably be“substantially purified”. The term “substantially purified”, as usedherein, refers to a molecule separated from substantially all othermolecules normally associated with it in its native environmentalconditions. More preferably a substantially purified molecule is thepredominant species present in a preparation. A substantially purifiedmolecule may be greater than about 60% free, preferably about 75% free,more preferably about 90% free, and most preferably about 95% free fromthe other molecules (exclusive of solvent) present in the naturalmixture. The term “substantially purified” is not intended to encompassmolecules present in their native environmental conditions.

The agents of the present invention may also be recombinant. As usedherein, the term recombinant means any agent (e.g., DNA, peptide etc.),that is, or results, however indirectly, from human manipulation of anucleic acid molecule.

It is understood that the agents of the present invention may be labeledwith reagents that facilitate detection of the agent (e.g., fluorescentlabels, Prober et al., Science, 238:336-340 (1987); Albarella et al., EP144 914; chemical labels, Sheldon et al., U.S. Pat. No. 4,582,789;Albarella et al., U.S. Pat. No. 4,563,417; modified bases, Miyoshi etal., EP 119 448).

Nucliec Acid Molecules

Agents of the present invention include nucleic acid molecules. In apreferred aspect of the present invention the nucleic acid moleculecomprises a nucleic acid sequence that encodes a homogentisate prenyltransferase. As used herein, a homogentisate prenyl transferase is anyplant protein that is capable of specifically catalyzing the formationof 2-methyl-6-phytylbenzoquinol (2-methyl-6-geranylgeranylbenzoquinol)from phytyl-DP (GGDP) and homogentisate.

An example of a more preferred homogentisate prenyl transferase is apolypeptide with the amino acid sequence selected from the groupconsisting of SEQ ID NOs: 5, 9-11, 43-44, 55, 58, and 90. In a morepreferred embodiment, the homogentisate prenyl transferase is encoded byany nucleic acid molecule encoding an amino acid sequence selected fromthe group consisting of SEQ ID NOs: 5, 9-11, 43-44, 55, 58, and 90.

In another preferred aspect of the present invention the nucleic acidmolecule of the present invention comprises a nucleic acid sequenceencoding a polypeptide selected from the group consisting of SEQ ID NOs:5, 9-11, 43-44, 55, 58, and 90, and complements thereof and fragments ofeither.

In another preferred aspect of the present invention the nucleic acidmolecule of the present invention comprises a nucleic acid sequenceselected from the group consisting of SEQ ID NOs: 31, 34-36, 59-60, and91.

In another embodiment, the present invention includes nucleic acidmolecules encoding polypeptides having a region of conserved amino acidsequence shown in any of FIGS. 2 a-2 c, 3 a-3 c, 24 a-24 b, 25 a-25 b,33 a-33 c, 34 a-34 b, 35 a-b and 36, and complements of those nucleicacid molecules. In a preferred embodiment, the present inventionincludes nucleic acid molecules encoding polypeptides comprising asequence selected from the group consisting of SEQ ID NOs: 39-42, 46-49,and 92-95, and complements of those nucleic acid molecules. The presentinvention includes and provides said nucleic acid molecule wherein thepolypeptide further comprises more than one amino acid sequence selectedfrom the group consisting of SEQ ID NOs: 39-42, 46-49, and 92-95.

In a further preferred embodiment the present invention includes nucleicacid molecules encoding polypeptides comprising two or more, three ormore, or four sequences selected from the group consisting of SEQ IDNOs: 39-42, 46-49, and 92-95, and complements of those nucleic acidmolecules. In another embodiment, the present invention includes nucleicacid molecules encoding polypeptides having homogentisate prenyltransferase activity and a region of conserved amino acid sequence shownin any of FIGS. 2 a-2 c, 3 a-3 c, 24 a-24 b, 25 a-25 b, 33 a-33 c, 34a-34 b, 35 a-35 b and 36, and complements of those nucleic acidmolecules. In a preferred embodiment, the present invention includesnucleic acid molecules encoding polypeptides having homogentisate prenyltransferase activity and comprising a sequence selected from the groupconsisting of SEQ ID NOs: 39-42, 46-49, and 92-95 and complements ofthose nucleic acid molecules. The present invention includes andprovides said nucleic acid molecule wherein the polypeptide furthercomprises more than one amino acid sequence selected from the groupconsisting of SEQ ID NOs: 39-42, 46-49, and 92-95.

In a further preferred embodiment the present invention includes nucleicacid molecules encoding polypeptides having homogentisate prenyltransferase activity and comprising two or more, three or more, or foursequences selected from the group consisting of SEQ ID NOs: 39-42,46-49, and 92-95, and complements of those nucleic acid molecules. Inanother embodiment, the present invention includes nucleic acidmolecules, excluding nucleic acid molecules derived from Nostocpunctiforme, Anabaena, Synechocystis, Zea mays, Glycine max, Arabidopsisthaliana, Oryza sativa, Trichodesmium erythraeum, Chloroflexusaurantiacus, wheat, leek, canola, cotton, or tomato, encodingpolypeptides having a region of conserved amino acid sequence shown inany of FIGS. 2 a-2 c, 3 a-3 c, 24 a-24 b, 25 a-25 b, 33 a-33 c, 34 a-34b, 35 a-35 b and 36, and complements of those nucleic acid molecules. Ina preferred embodiment, the present invention includes nucleic acidmolecules, excluding nucleic acid molecules derived from Nostocpunctiforme, Anabaena, Synechocystis, Zea mays, Glycine max, Arabidopsisthaliana, Oryza sativa, Trichodesmium erythraeum, Chloroflexusaurantiacus, wheat, leek, canola, cotton, or tomato, encodingpolypeptides comprising a sequence selected from the group consisting ofSEQ ID NOs: 39-42, 46-49, and 92-95 and complements of those nucleicacid molecules. The present invention includes and provides said nucleicacid molecule wherein the polypeptide further comprises more than oneamino acid sequence selected from the group consisting of SEQ ID NOs:39-42, 46-49, and 92-95.

In a further preferred embodiment the present invention includes nucleicacid molecules, excluding nucleic acid molecules derived from Nostocpunctiforme, Anabaena, Synechocystis, Zea mays, Glycine max, Arabidopsisthaliana, Oryza sativa, Trichodesmium erythraeum, Chloroflexusaurantiacus, wheat, leek, canola, cotton, or tomato, encodingpolypeptides comprising two or more, three or more, or four sequencesselected from the group consisting of SEQ ID NOs: 39-42, 46-49, and92-95.

In another embodiment, the present invention includes nucleic acidmolecules, excluding nucleic acid molecules derived from Nostocpunctiforme, Anabaena, Synechocystis, Zea mays, Glycine max, Arabidopsisthaliana, Oryza sativa, Trichodesmium erythraeum, Chloroflexusaurantiacus, wheat, leek, canola, cotton, Sulfolobus, Aeropyum, sorghum,or tomato, encoding polypeptides having homogentisate prenyl transferaseactivity and a region of conserved amino acid sequence shown in any ofFIGS. 2 a-2 c, 3 a-3 c, 24 a-24 b, 25 a-25 b, 33 a-33 c, 34 a-34 b, 35a-35 b and 36 and complements of those nucleic acid molecules. In apreferred embodiment, the present invention includes nucleic acidmolecules, excluding nucleic acid molecules derived from Nostocpunctiforme, Anabaena, Synechocystis, Zea mays, Glycine max, Arabidopsisthaliana, Oryza sativa, Trichodesmium erythraeum, Chloroflexusaurantiacus, wheat, leek, canola, cotton, or tomato, encodingpolypeptides having homogentisate prenyl transferase activity andcomprising a sequence selected from the group consisting of SEQ ID NOs:39-42, 46-49, and 92-95. The present invention includes and providessaid nucleic acid molecule wherein the polypeptide further comprisesmore than one amino acid sequence selected from the group consisting ofSEQ ID NOs: 39-42, 46-49, and 92-95.

In a further preferred embodiment the present invention includes nucleicacid molecules, excluding nucleic acid molecules derived from Nostocpunctiforme, Anabaena, Synechocystis, Zea mays, Glycine max, Arabidopsisthaliana, Oryza sativa, Trichodesmium erythraeum, Chloroflexusaurantiacus, wheat, leek, canola, cotton, or tomato, encodingpolypeptides having homogentisate prenyl transferase activity andcomprising two or more, three or more, or four sequences selected fromthe group consisting of SEQ ID NOs: 39-42, 46-49, and 92-95.

In one embodiment of a method of the present invention, any of thenucleic acid sequences or polypeptide sequences, or fragments of either,of the present invention can be used to search for related sequences. Ina preferred embodiment, a member selected from the group consisting ofSEQ ID NOs: 5, 9-11, 43-44, 57-58, and 90 is used to search for relatedsequences. In a preferred embodiment, a member selected from the groupconsisting of SEQ ID NOs: 31, 34-36, 59-60, 88-89, and 91 is used tosearch for related sequences. In another embodiment, any of the motifsor regions of conserved sequence shown in FIGS. 2 a-2 c, 3 a-3 c, 24a-24 b, 25 a-25 b, 33 a-33 c, 34 a-34 b, 35 a-35 b and 36 are used tosearch for related amino acid sequences. In a preferred embodiment, amember selected from the group consisting of SEQ ID NOs: 39-42 and 46-49is used to search for related sequences. In one embodiment, one or moreof SEQ ID NOs: 39-42, 46-49, and 92-95 is used to search for relatedsequences. As used herein, “search for related sequences” means anymethod of determining relatedness between two sequences, including, butnot limited to, searches that compare sequence homology: for example, aPBLAST search of a database for relatedness to a single amino acidsequence. Other searches may be conducted using profile based methods,

-   such as the HMM (Hidden Markov model) META-MEME-   (http://metameme.sdsc.edu/mhmm-links.html), PSI-BLAST-   (http://www.ncbi.nlm.nih.gov/BLAST/). The present invention includes    and provides for homogentisate prenyl transferases discovered using    one or more of the alignments of FIGS. 2 a-2 c, 3 a-3 c, 24 a-24 b,    25 a-25 b, 33 a-33 c, 34 a-34 b, 35 a-35 b and 36.

As used herein, a nucleic acid molecule is said to be “derived from” aparticular organism, species, ecotype, etc., when the sequence of thenucleic acid molecule originated from that organism, species, ecotype,etc. “Derived from” therefore includes copies of nucleic acid moleculesderived through, for example, PCR, as well as synthetically generatednucleic acid molecules having the same nucleic acid sequence as theoriginal organism, species, ecotype, etc. Likewise, a polypeptide issaid to be “derived from” a nucleic acid molecule when that nucleic acidmolecule is used to code for the polypeptide, whether the polypeptide isenzymatically generated from the nucleic acid molecule or synthesizedbased on the sequence information inherent in the nucleic acid molecule.

The present invention includes the use of the above-described conservedsequences and fragments thereof in transgenic plants, other organisms,and for other uses, including, without limitation, as described below.

In another preferred aspect of the present invention a nucleic acidmolecule comprises nucleotide sequences encoding a plastid transitpeptide operably fused to a nucleic acid molecule that encodes a proteinor fragment of the present invention.

In another preferred embodiment of the present invention, the nucleicacid molecules of the present invention encode mutant tocopherolhomogentisate prenyl transferase enzymes. As used herein, a mutantenzyme is any enzyme that contains an amino acid that is different fromthe amino acid in the same position of a wild type enzyme of the sametype.

It is understood that in a further aspect of nucleic acid sequences ofthe present invention, the nucleic acids can encode a protein thatdiffers from any of the proteins in that one or more amino acids havebeen deleted, substituted or added without altering the function. Forexample, it is understood that codons capable of coding for suchconservative amino acid substitutions are known in the art.

In one aspect of the present invention the nucleic acids of the presentinvention are said to be introduced nucleic acid molecules. A nucleicacid molecule is said to be “introduced” if it is inserted into a cellor organism as a result of human manipulation, no matter how indirect.Examples of introduced nucleic acid molecules include, withoutlimitation, nucleic acids that have been introduced into cells viatransformation, transfection, injection, and projection, and those thathave been introduced into an organism via conjugation, endocytosis,phagocytosis, etc.

One subset of the nucleic acid molecules of the present invention isfragment nucleic acids molecules. Fragment nucleic acid molecules mayconsist of significant portion(s) of, or indeed most of, the nucleicacid molecules of the present invention, such as those specificallydisclosed. Alternatively, the fragments may comprise smalleroligonucleotides (having from about 15 to about 400 nucleotide residuesand more preferably, about 15 to about 30 nucleotide residues, or about50 to about 100 nucleotide residues, or about 100 to about 200nucleotide residues, or about 200 to about 400 nucleotide residues, orabout 275 to about 350 nucleotide residues).

A fragment of one or more of the nucleic acid molecules of the presentinvention may be a probe and specifically a PCR probe. A PCR probe is anucleic acid molecule capable of initiating a polymerase activity whilein a double-stranded structure with another nucleic acid. Variousmethods for determining the structure of PCR probes and PCR techniquesexist in the art. Computer generated searches using programs such asPrimer3 (www-genome.wi.mit.edu/cgi-bin/primer/primer3.cgi), STSPipeline(www-genome.wi.mit.edu/cgi-bin/www-STS_Pipeline), or GeneUp (Pesole etal., BioTechniques, 25:112-123 (1998)), for example, can be used toidentify potential PCR primers.

Nucleic acid molecules or fragments thereof of the present invention arecapable of specifically hybridizing to other nucleic acid moleculesunder certain circumstances. Nucleic acid molecules of the presentinvention include those that specifically hybridize to those nucleicacid molecules disclosed herein, such as those encoding any of SEQ IDNOs: 5, 9-11, 43-44, 57-58, and 90, and complements thereof. Nucleicacid molecules of the present invention include those that specificallyhybridize to a nucleic acid molecules comprising a member selected fromthe group consisting of SEQ ID NOs: 31, 34-36, 59-60, and 91, andcomplements thereof.

As used herein, two nucleic acid molecules are said to be capable ofspecifically hybridizing to one another if the two molecules are capableof forming an anti-parallel, double-stranded nucleic acid structure.

A nucleic acid molecule is said to be the “complement” of anothernucleic acid molecule if they exhibit complete complementarity. As usedherein, molecules are said to exhibit “complete complementarity” whenevery nucleotide of one of the molecules is complementary to anucleotide of the other. Two molecules are said to be “minimallycomplementary” if they can hybridize to one another with sufficientstability to permit them to remain annealed to one another under atleast conventional “low-stringency” conditions. Similarly, the moleculesare said to be “complementary” if they can hybridize to one another withsufficient stability to permit them to remain annealed to one anotherunder conventional “high-stringency” conditions. Conventional stringencyconditions are described by Sambrook et al., Molecular Cloning, ALaboratory Manual, 2nd Ed., Cold Spring Harbor Press, Cold SpringHarbor, N.Y. (1989), and by Haymes et al., Nucleic Acid Hybridization, APractical Approach, IRL Press, Washington, D.C. (1985). Departures fromcomplete complementarity are therefore permissible, as long as suchdepartures do not completely preclude the capacity of the molecules toform a double-stranded structure. Thus, in order for a nucleic acidmolecule to serve as a primer or probe it need only be sufficientlycomplementary in sequence to be able to form a stable double-strandedstructure under the particular solvent and salt concentrations employed.

Appropriate stringency conditions which promote DNA hybridization are,for example, 6.0× sodium chloride/sodium citrate (SSC) at about 45° C.,followed by a wash of 2.0×SSC at 20-25° C., are known to those skilledin the art or can be found in Current Protocols in Molecular Biology,John Wiley & Sons, NY (1989), 6.3.1-6.3.6. For example, the saltconcentration in the wash step can be selected from a low stringency ofabout 2.0×SSC at 50° C. to a high stringency of about 0.2×SSC at 65° C.In addition, the temperature in the wash step can be increased from lowstringency conditions at room temperature, about 22° C., to highstringency conditions at about 65° C. Both temperature and salt may bevaried, or either the temperature or the salt concentration may be heldconstant while the other variable is changed.

In a preferred embodiment, a nucleic acid of the present invention willspecifically hybridize to one or more of the nucleic acid moleculesdescribed herein and complements thereof, such as those encoding any ofSEQ ID NOs: 5, 9-11, 43-44, 57-58, and 90, under moderately stringentconditions, for example at about 2.0×SSC and about 65° C.

In a particularly preferred embodiment, a nucleic acid of the presentinvention will include those nucleic acid molecules that specificallyhybridize to one or more nucleic acid molecules encoding any of SEQ IDNOs: 5, 9-11, 43-44, 57-58, and 90, and complements thereof, under highstringency conditions such as 0.2×SSC and about 65° C.

In one aspect of the present invention, the nucleic acid molecules ofthe present invention have one or more nucleic acid sequences encodingSEQ ID NOs: 5, 9-11, 43-44, 57-58, and 90, or complements thereof. Inanother aspect of the present invention, one or more of the nucleic acidmolecules of the present invention share between about 100% and about90% sequence identity with one or more of the nucleic acid sequencesencoding SEQ ID NOs: 5, 9-11, 43-44, 57-58, and 90, and complementsthereof, and fragments of either. In a further aspect of the presentinvention, one or more of the nucleic acid molecules of the presentinvention share between about 100% and about 95% sequence identity withone or more of the nucleic acid sequences encoding SEQ ID NOs: 5, 9-11,43-44, 57-58, and 90, and complements thereof, and fragments of either.In a more preferred aspect of the present invention, one or more of thenucleic acid molecules of the present invention share between about 100%and about 98% sequence identity with one or more of the nucleic acidsequences encoding SEQ ID NOs: 5, 9-11, 43-44, 57-58, and 90, andcomplements thereof, and fragments of either. In an even more preferredaspect of the present invention, one or more of the nucleic acidmolecules of the present invention share between about 100% and about99% sequence identity with one or more of the sequences encoding SEQ IDNOs: 5, 9-11, 43-44, 57-58, and 90, and complements thereof, andfragments of either.

In a preferred embodiment the percent identity calculations areperformed using BLASTN or BLASTP (default, parameters, version 2.0.8,Altschul et al., Nucleic Acids Res., 25:3389-3402 (1997)).

A nucleic acid molecule of the present invention can also encode ahomolog polypeptide. As used herein, a homolog polypeptide molecule orfragment thereof is a counterpart protein molecule or fragment thereofin a second species (e.g., corn rubisco small subunit is a homolog ofArabidopsis rubisco small subunit). A homolog can also be generated bymolecular evolution or DNA shuffling techniques, so that the moleculeretains at least one functional or structure characteristic of theoriginal polypeptide (see, for example, U.S. Pat. No. 5,811,238).

In another embodiment, the homolog is selected from the group consistingof alfalfa, Arabidopsis, barley, Brassica campestris, Brassica napus,oilseed rape, broccoli, cabbage, canola, citrus, cotton, garlic, oat,Allium, flax, an ornamental plant, peanut, pepper, potato, rapeseed,rice, rye, sorghum, strawberry, sugarcane, sugarbeet, tomato, wheat,poplar, pine, fir, eucalyptus, apple, lettuce, lentils, grape, banana,tea, turf grasses, sunflower, soybean, corn, Phaseolus, crambe, mustard,castor bean, sesame, cottonseed, linseed, safflower, and oil palm. Moreparticularly, preferred homologs are selected from canola, corn,Brassica campestris, Brassica napus, oilseed rape, soybean, crambe,mustard, castor bean, peanut, sesame, cottonseed, linseed, rapeseed,safflower, oil palm, flax, and sunflower. In an even more preferredembodiment, the homolog is selected from the group consisting of canola,rapeseed, corn, Brassica campestris, Brassica napus, oilseed rape,soybean, sunflower, safflower, oil palms, and peanut. In a particularlypreferred embodiment, the homolog is soybean. In a particularlypreferred embodiment, the homolog is canola. In a particularly preferredembodiment, the homolog is oilseed rape.

In a preferred embodiment, nucleic acid molecules encoding SEQ ID NOs:5, 9-11, 43-44, 57-58, and 90, and complements thereof, and fragments ofeither; or more preferably encoding SEQ ID NOs: 5, 9-11, 43-44, 57-58,and 90, and complements thereof, can be utilized to obtain suchhomologs.

In another further aspect of the present invention, nucleic acidmolecules of the present invention can comprise sequences that differfrom those encoding a polypeptide or fragment thereof due to the factthat a polypeptide can have one or more conservative amino acid changes,and nucleic acid sequences coding for the polypeptide can therefore havesequence differences. It is understood that codons capable of coding forsuch conservative amino acid substitutions are known in the art.

It is well known in the art that one or more amino acids in a nativesequence can be substituted with other amino acid(s), the charge andpolarity of which are similar to that of the native amino acid, i.e., aconservative amino acid substitution. Conservative substitutes for anamino acid within the native polypeptide sequence can be selected fromother members of the class to which the amino acid belongs. Amino acidscan be divided into the following four groups: (1) acidic amino acids;(2) basic amino acids; (3) neutral polar amino acids; and (4) neutralnonpolar amino acids. Representative amino acids within these variousgroups include, but are not limited to, (1) acidic (negatively charged)amino acids such as aspartic acid and glutamic acid; (2) basic(positively charged) amino acids such as arginine, histidine, andlysine; (3) neutral polar amino acids such as glycine, serine,threonine, cysteine, cystine, tyrosine, asparagine, and glutamine; and(4) neutral nonpolar (hydrophobic) amino acids such as alanine, leucine,isoleucine, valine, proline, phenylalanine, tryptophan, and methionine.

Conservative amino acid substitution within the native polypeptidesequence can be made by replacing one amino acid from within one ofthese groups with another amino acid from within the same group. In apreferred aspect, biologically functional equivalents of the proteins orfragments thereof of the present invention can have ten or fewerconservative amino acid changes, more preferably seven or fewerconservative amino acid changes, and most preferably five or fewerconservative amino acid changes. The encoding nucleotide sequence willthus have corresponding base substitutions, permitting it to encodebiologically functional equivalent forms of the polypeptides of thepresent invention.

It is understood that certain amino acids may be substituted for otheramino acids in a protein structure without appreciable loss ofinteractive binding capacity with structures such as, for example,antigen-binding regions of antibodies or binding sites on substratemolecules. Because it is the interactive capacity and nature of aprotein that defines that protein's biological functional activity,certain amino acid sequence substitutions can be made in a proteinsequence and, of course, its underlying DNA coding sequence and,nevertheless, a protein with like properties can still be obtained. Itis thus contemplated by the inventors that various changes may be madein the peptide sequences of the proteins or fragments of the presentinvention, or corresponding DNA sequences that encode said peptides,without appreciable loss of their biological utility or activity. It isunderstood that codons capable of coding for such amino acid changes areknown in the art.

In making such changes, the hydropathic index of amino acids may beconsidered. The importance of the hydropathic amino acid index inconferring interactive biological function on a protein is generallyunderstood in the art (Kyte and Doolittle, J. Mol. Biol., 157:105-132(1982)). It is accepted that the relative hydropathic character of theamino acid contributes to the secondary structure of the resultantpolypeptide, which in turn defines the interaction of the protein withother molecules, for example, enzymes, substrates, receptors, DNA,antibodies, antigens, and the like.

Each amino acid has been assigned a hydropathic index on the basis ofits hydrophobicity and charge characteristics (Kyte and Doolittle, J.Mol. Biol., 157:105-132 (1982)); these are isoleucine (+4.5), valine(+4.2), leucine (+3.8), phenylalanine (+2.8), cysteine/cystine (+2.5),methionine (+1.9), alanine (+1.8), glycine (−0.4), threonine (−0.7),serine (−0.8), tryptophan (−0.9), tyrosine (−1.3), proline (−1.6),histidine (−3.2), glutamate (−3.5), glutamine (−3.5), aspartate (−3.5),asparagine (−3.5), lysine (−3.9), and arginine (4.5).

In making such changes, the substitution of amino acids whosehydropathic indices are within ±2 is preferred, those that are within ±1are particularly preferred, and those within ±0.5 are even moreparticularly preferred.

It is also understood in the art that the substitution of like aminoacids can be made effectively on the basis of hydrophilicity. U.S. Pat.No. 4,554,101 states that the greatest local average hydrophilicity of aprotein, as governed by the hydrophilicity of its adjacent amino acids,correlates with a biological property of the protein.

As detailed in U.S. Pat. No. 4,554,101, the following hydrophilicityvalues have been assigned to amino acid residues: arginine (+3.0),lysine (+3.0), aspartate (+3.0±1), glutamate (+3.0±1), serine (+0.3),asparagine (+0.2), glutamine (+0.2), glycine (0), threonine (−0.4),proline (−0.5±1), alanine (−0.5), histidine (−0.5), cysteine (−1.0),methionine (−1.3), valine (−1.5), leucine (−1.8), isoleucine (−1.8),tyrosine (−2.3), phenylalanine (−2.5), and tryptophan (−3.4).

In making such changes, the substitution of amino acids whosehydrophilicity values are within ±2 is preferred, those that are within±1 are particularly preferred, and those within ±0.5 are even moreparticularly preferred.

In a further aspect of the present invention, one or more of the nucleicacid molecules of the present invention differ in nucleic acid sequencefrom those for which a specific sequence is provided herein because oneor more codons has been replaced with a codon that encodes aconservative substitution of the amino acid originally encoded.

Agents of the present invention include nucleic acid molecules thatencode at least about a contiguous 10 amino acid region of a polypeptideof the present invention, more preferably at least about a contiguous25, 40, 50, 100, or 125 amino acid region of a polypeptide of thepresent invention.

In a preferred embodiment, any of the nucleic acid molecules of thepresent invention can be operably linked to a promoter region thatfunctions in a plant cell to cause the production of an mRNA molecule,where the nucleic acid molecule that is linked to the promoter isheterologous with respect to that promoter. As used herein,“heterologous” means not naturally occurring together.

The nature of the coding sequences of non-plant genes can distinguishthem from plant genes as well as many other heterologous genes expressedin plants. For example, the average A+T content of bacteria can behigher than that for plants. The A+T content of the genomes (and thusthe genes) of any organism are features of that organism and reflect itsevolutionary history. While within any one organism genes have similarA+T content, the A+T content can vary tremendously from organism toorganism. For example, some Bacillus species have among the most A+Trich genomes while some Steptomyces species are among the least A+T richgenomes (about 30 to 35% A+T).

Due to the degeneracy of the genetic code and the limited number ofcodon choices for any amino acid, most of the “excess” A+T of thestructural coding sequences of some Bacillus species, for example, arefound in the third position of the codons. That is, genes of someBacillus species have A or T as the third nucleotide in many codons.Thus A+T content in part can determine codon usage bias. In addition, itis clear that genes evolve for maximum function in the organism in whichthey evolve. This means that particular nucleotide sequences found in agene from one organism, where they may play no role except to code for aparticular stretch of amino acids, have the potential to be recognizedas gene control elements in another organism (such as transcriptionalpromoters or terminators, polyA addition sites, intron splice sites, orspecific mRNA degradation signals). It is perhaps surprising that suchmisread signals are not a more common feature of heterologous geneexpression, but this can be explained in part by the relativelyhomogeneous A+T content (about 50%) of many organisms. This A+T contentplus the nature of the genetic code put clear constraints on thelikelihood of occurrence of any particular oligonucleotide sequence.Thus, a gene from E. coli with a 50% A+T content is much less likely tocontain any particular A+T rich segment than a gene from B.thuringiensis. The same can be true between genes in a bacterium andgenes in a plant, for example.

Any of the nucleic acid molecules of the present invention can bealtered via any methods known in the art in order to make the codonswithin the nucleic acid molecule more appropriate for the organism inwhich the nucleic acid molecule is located. That is, the presentinvention includes the modification of any of the nucleic acid moleculesdisclosed herein to improve codon usage in a host organism.

It is preferred that regions comprising many consecutive A+T bases orG+C bases are disrupted since these regions are predicted to have ahigher likelihood to form hairpin structure due to self-complementarity.Therefore, insertion of heterogeneous base pairs would reduce thelikelihood of self-complementary secondary structure formation which areknown to inhibit transcription and/or translation in some organisms. Inmost cases, the adverse effects may be minimized by using sequenceswhich do not contain more than five consecutive A+T or G+C.

Protein and Peptide Molecules

A class of agents includes one or more of the polypeptide moleculesencoded by a nucleic acid agent of the present invention. A particularpreferred class of proteins is that having an amino acid sequenceselected from the group consisting of SEQ ID NOs: 5, 9-11, 43-44, 57-58,and 90, and fragments thereof.

In another embodiment, the present invention includes polypeptideshaving a region of conserved amino acid sequence shown in any of FIGS. 2a-2 c, 3 a-3 c, 24 a-24 b, 25 a-25 b, 33 a-33 c, 34 a-34 b, 35 a-35 band 36. In an embodiment, the present invention includes polypeptidescomprising a sequence selected from the group consisting of SEQ ID NOs:39-42, 46-49, and 92-95. The present invention includes and providessaid substantially purified polypeptide wherein more than one amino acidsequence is selected from the group consisting of SEQ ID NOs: 39-42,46-49, and 92-95. In a further preferred embodiment the presentinvention includes polypeptides comprising two or more, three or more,or four sequences selected from the group consisting of SEQ ID NOs:39-42, 46-49, and 92-95.

In another embodiment, the present invention includes polypeptideshaving homogentisate prenyl transferase activity and a region ofconserved amino acid sequence shown in any of FIGS. 2 a-2 c, 3 a-3 c, 25a-25 c, 33 a-33 c, 34 a-34 b, 35 a-35 b and 36. In an embodiment, thepresent invention includes polypeptides having homogentisate prenyltransferase activity and comprising a sequence selected from the groupconsisting of SEQ ID NOs: 39-42, 46-49, and 92-95. The present inventionincludes and provides said substantially purified polypeptide whereinmore than one amino acid sequence is selected from the group consistingof SEQ ID NOs: 39-42, 46-49, and 92-95.

In a further preferred embodiment the present invention includespolypeptides having homogentisate prenyl transferase activity andcomprising two or more, three or more, or four sequences selected fromthe group consisting of SEQ ID NOs: 39-42, 46-49, and 92-95.

In another embodiment, the present invention includes polypeptideshaving a region of conserved amino acid sequence shown in any of FIGS. 2a-2 c, 3 a-3 c, 24 a-24 b, 25 a-25 c, 33 a-33 c, 34 a-34 b, 35 a-35 b or36, excluding polypeptides derived from nucleic acid molecules derivedfrom Nostoc punctiforme, Anabaena, Synechocystis, Zea mays, Glycine max,Arabidopsis thaliana, Oryza sativa, Trichodesmium erythraeum,Chloroflexus aurantiacus, wheat, leek, canola, cotton, Sulfolobus,Aeropyum, sorghum, or tomato. In a preferred embodiment, the presentinvention includes polypeptides comprising a sequence selected from thegroup consisting of SEQ ID NOs: 39-42, 46-49, and 92-95 excludingpolypeptides derived from nucleic acid molecules derived from Nostocpunctiforme, Anabaena, Synechocystis, Zea mays, Glycine max, Arabidopsisthaliana, Oryza sativa, Trichodesmium erythraeum, Chloroflexusaurantiacus, wheat, leek, canola, cotton, or tomato. The presentinvention includes and provides said substantially purified polypeptidewherein more than onethe amino acid sequence is selected from the groupconsisting of SEQ ID NOs: 39-42,46-49, and 92-95.

In a further preferred embodiment the present invention includespolypeptides comprising two or more, three or more, or four sequencesselected from the group consisting of SEQ ID NOs: 39-42, 46-49, and92-95, excluding polypeptides derived from nucleic acid moleculesderived from Nostoc punctiforme, Anabaena, Synechocystis, Zea mays,Glycine max, Arabidopsis thaliana, Oryza sativa, Trichodesmiumerythraeum, Chloroflexus aurantiacus, wheat, leek, canola, cotton, ortomato.

In another embodiment, the present invention includes polypeptideshaving homogentisate prenyl transferase activity and a region ofconserved amino acid sequence shown in any of FIGS. 2 a-2 c, 3 a-3 c, 25a-25 c, 33 a-33 c, 34 a-34 b, 35 a-35 b or 36, excluding polypeptidesderived from nucleic acid molecules derived from Nostoc punctiforme,Anabaena, Synechocystis, Zea mays, Glycine max, Arabidopsis thaliana,Oryza sativa, Trichodesmium erythraeum, Chloroflexus aurantiacus, wheat,leek, canola, cotton, or tomato. In a preferred embodiment, the presentinvention includes polypeptides having homogentisate prenyl transferaseactivity and comprising a sequence selected from the group consisting ofSEQ ID NOs: 39-42, 46-49, and 92-95, excluding polypeptides derived fromnucleic acid molecules derived from Nostoc punctiforme, Anabaena,Synechocystis, Zea mays, Glycine max, Arabidopsis thaliana, Oryzasativa, Trichodesmium erythraeum, Chloroflexus aurantiacus, wheat, leek,canola, cotton, or tomato. The present invention includes and providessaid substantially purified polypeptide wherein more than one amino acidsequence is selected from the group consisting of SEQ ID NOs: 39-42,46-49, and 92-95.

In a further preferred embodiment the present invention includespolypeptides having homogentisate prenyl transferase activity andcomprising two or more, three or more, or four sequences selected fromthe group consisting of SEQ ID NOs: 39-42, 46-49, and 92-95, excludingpolypeptides derived from nucleic acid molecules derived from Nostocpunctiforme, Anabaena, Synechocystis, Zea mays, Glycine max, Arabidopsisthaliana, Oryza sativa, Trichodesmium erythraeum, Chloroflexusaurantiacus, wheat, leek, canola, cotton, or tomato.

Polypeptide agents may have C-terminal or N-terminal amino acid sequenceextensions. One class of N-terminal extensions employed in a preferredembodiment are plastid transit peptides. When employed, plastid transitpeptides can be operatively linked to the N-terminal sequence, therebypermitting the localization of the agent polypeptides to plastids. In anembodiment of the present invention, any suitable plastid targetingsequence can be used. Where suitable, a plastid targeting sequence canbe substituted for a native plastid targetting sequence, for example,for the CTP occurring natively in the tocopherol homogentisate prenyltransferase protein. In a further embodiment, a plastid targetingsequence that is heterologous to any homogentisate prenyl transferaseprotein or fragment described herein can be used. In a furtherembodiment, any suitable, modified plastid targetting sequence can beused. In another embodiment, the plastid targeting sequence is a CTP1sequence (see WO 00/61771).

In a preferred aspect a protein of the present invention is targeted toa plastid using either a native transit peptide sequence or aheterologous transit peptide sequence. In the case of nucleic acidsequences corresponding to nucleic acid sequences of non-higher plantorganisms such as cynobacteria, such nucleic acid sequences can bemodified to attach the coding sequence of the protein to a nucleic acidsequence of a plastid targeting peptide.

As used herein, the terms “protein”, “peptide molecule”, or“polypeptide” include any molecule that comprises five or more aminoacids. It is well known in the art that protein, peptide, or polypeptidemolecules may undergo modification, including post-translationalmodifications, such as, but not limited to, disulfide bond formation,glycosylation, phosphorylation, or oligomerization. Thus, as usedherein, the terms “protein”, “peptide molecule”, or “polypeptide”include any protein that is modified by any biological or non-biologicalprocess. The terms “amino acid” and “amino acids” refer to all naturallyoccurring L-amino acids. This definition is meant to include norleucine,norvaline, ornithine, homocysteine, and homoserine.

One or more of the protein or fragments thereof, peptide molecules, orpolypeptide molecules may be produced via chemical synthesis, or morepreferably, by expression in a suitable bacterial or eukaryotic host.Suitable methods for expression are described by Sambrook et al., In:Molecular Cloning, A Laboratory Manual, 2nd Edition, Cold Spring HarborPress, Cold Spring Harbor, N.Y. (1989) or similar texts.

A “protein fragment” is a peptide or polypeptide molecule whose aminoacid sequence comprises a subset of the amino acid sequence of thatprotein. A protein or fragment thereof that comprises one or moreadditional peptide regions not derived from that protein is a “fusion”protein. Such molecules may be derivatized to contain carbohydrate orother moieties (such as keyhole limpet hemocyanin). Fusion protein orpeptide molecules of the present invention are preferably produced viarecombinant means.

Another class of agents comprises protein, peptide molecules, orpolypeptide molecules, or fragments or fusions thereof comprising SEQ IDNOs: 5, 9-11, 43-44, 57-58, and 90, and fragments thereof in whichconservative, non-essential, or non-relevant amino acid residues havebeen added, replaced, or deleted. Computerized means for designingmodifications in protein structure are known in the art (Dahiyat andMayo, Science, 278:82-87 (1997)).

A protein, peptide, or polypeptide of the present invention can also bea homolog protein, peptide, or polypeptide. As used herein, a homologprotein, peptide, or polypeptide or fragment thereof is a counterpartprotein, peptide, or polypeptide or fragment thereof in a secondspecies. A homolog can also be generated by molecular evolution or DNAshuffling techniques, so that the molecule retains at least onefunctional or structure characteristic of the original (see, forexample, U.S. Pat. No. 5,811,238).

In another embodiment, the homolog is selected from the group consistingof alfalfa, Arabidopsis, barley, broccoli, cabbage, canola, citrus,cotton, garlic, oat, Allium, flax, an ornamental plant, peanut, pepper,potato, rapeseed, rice, rye, sorghum, strawberry, sugarcane, sugarbeet,tomato, wheat, poplar, pine, fir, eucalyptus, apple, lettuce, lentils,grape, banana, tea, turf grasses, sunflower, soybean, corn, andPhaseolus. More particularly, preferred homologs are selected fromcanola, rapeseed, corn, Brassica campestris, Brassica napus, oilseedrape, soybean, crambe, mustard, castor bean, peanut, sesame, cottonseed,linseed, safflower, oil palm, flax, and sunflower. In an even morepreferred embodiment, the homolog is selected from the group consistingof canola, rapeseed, corn, Brassica campestris, Brassica napus, oilseedrape, soybean, sunflower, safflower, oil palms, and peanut. In apreferred embodiment, the homolog is soybean. In a preferred embodiment,the homolog is canola. In a preferred embodiment, the homolog is oilseedrape.

In a preferred embodiment, the nucleic acid molecules of the presentinvention or complements and fragments of either can be utilized toobtain such homologs.

Agents of the present invention include proteins and fragments thereofcomprising at least about a contiguous 10 amino acid region preferablycomprising at least about a contiguous 20 amino acid region, even morepreferably comprising at least about a contiguous 25, 35, 50, 75, or 100amino acid region of a protein of the present invention. In anotherpreferred embodiment, the proteins of the present invention includebetween about 10 and about 25 contiguous amino acid region, morepreferably between about 20 and about 50 contiguous amino acid region,and even more preferably between about 40 and about 80 contiguous aminoacid region.

Plant Constructs and Plant Transformants

One or more of the nucleic acid molecules of the present invention maybe used in plant transformation or transfection. Exogenous geneticmaterial may be transferred into a plant cell and the plant cellregenerated into a whole, fertile, or sterile plant. Exogenous geneticmaterial is any genetic material, whether naturally occurring orotherwise, from any source that is capable of being inserted into anyorganism.

In a preferred aspect of the present invention the exogenous geneticmaterial comprises a nucleic acid sequence of the present invention,more preferably one that encodes homogentisate prenyl transferase. Inanother preferred aspect of the present invention the exogenous geneticmaterial of the present invention comprises a nucleic acid sequenceencoding an amino acid sequence selected from the group consisting ofSEQ ID NOs: 5, 9-11, 43-44, 57-58, and 90, and complements thereof andfragments of either. In a further aspect of the present invention theexogenous genetic material comprises a nucleic acid sequence encoding anamino acid sequence selected from the group consisting of SEQ ID NOs: 5,9-11, 43-44, 57-58, and 90, and fragments of SEQ ID NOs: 5, 9-11, 43-44,57-58, and 90.

In an embodiment of the present invention, exogenous genetic materialencoding a homogentisate prenyl transferase enzyme or fragment thereofis introduced into a plant with one or more additional genes. In oneembodiment, preferred combinations of genes include a nucleic acidmolecule of the present invention and one or more of the followinggenes: tyrA (e.g., WO 02/089561 and Xia et al., J. Gen. Microbiol.,138:1309-1316 (1992)), tocopherol cyclase (e.g., WO 01/79472),prephenate dehydrogenase, dxs (e.g. Lois et al., Proc. Natl. Acad. Sci.(U.S.A.), 95(5):2105-2110 (1998)), dxr (e.g., U.S. Pub. 2002/0108814Aand Takahashi et al., Proc. Natl. Acad. Sci. (U.S.A.), 95 (17),9879-9884 (1998)), GGPPS (e.g., Bartley and Scolnik, Plant Physiol.,104:1469-1470 (1994)), HPPD (e.g., Norris et al., Plant Physiol.,117:1317-1323 (1998)), GMT (e.g., U.S. application Ser. No. 10/219,810,filed Aug. 16, 2002), tMT2 (e.g., U.S. application Ser. No. 10/279,029,filed Oct. 24, 2002), AANT1 (e.g., WO 02/090506), IDI (E.C.:5.3.3.2;Blanc et al., In: Plant Gene Register, PRG96-036; and Sato et al., DNARes., 4:215-230 (1997)), GGH (Graβes et al., Planta. 213-620 (2001)), ora plant ortholog and an antisense construct for homogentisic aciddioxygenase (Kridl et al., Seed Sci. Res., 1:209:219 (1991); Keegstra,Cell, 56(2):247-53 (1989); Nawrath, et al., Proc. Natl. Acad. Sci.(U.S.A.), 91:12760-12764 (1994); Cyanobase, www.kazusa.or.jp/cyanobase;Smith et al., Plant J., 11:83-92 (1997); WO 00/32757; ExPASy MolecularBiology Server, http://us.expasy.org/enzyme; MT1 WO 00/10380; gcpE, WO02/12478; Saint Guily et al., Plant Physiol., 100(2):1069-1071 (1992);Sato et al., J. DNA Res., 7(1):31-63 (2000)). In such combinations, insome crop plants, e.g., canola, a preferred promoter is a napin promoterand a preferred plastid targeting sequence is a CTP1 sequence. It ispreferred that gene products are targeted to the plastid.

In a preferred combination a nucleic acid molecule encoding ahomogentisate prenyl transferase polypeptide and a nucleic acid moleculeencoding any of the following enzymes: tyrA, prephenate dehydrogenase,tocopherol cyclase, dxs, dxr, GGPPS, HPPD, tMT2, MT1, GCPE, AANT1, IDI,GGH, GMT, or a plant ortholog and an antisense construct forhomogentisic acid dioxygenase are introduced into a plant.

For any of the above combinations, a nucleic acid molecule encoding ahomogentisate prenyl transferase polypeptide encodes a polypeptidecomprising a sequence selected from the group consisting of SEQ ID NOs:5, 9-11, 43-44, 57-58, and 90. In another preferred embodiment, anucleic acid molecule encoding a homogentisate prenyl transferasepolypeptide encodes a polypeptide comprising one or more of SEQ ID NOs:39-42, 46-49, and 92-95. In a preferred embodiment, the homogentisateprenyl transferase polypeptide does not have an amino acid sequence thatis derived from a nucleic acid derived from Nostoc punctiforme,Anabaena, Synechocystis, Zea mays, Glycine max, Arabidopsis thaliana,Oryza sativa, wheat, leek, canola, cotton, or tomato.

Such genetic material may be transferred into either monocotyledons ordicotyledons including, but not limited to canola, corn, soybean,Arabidopsis phaseolus, peanut, alfalfa, wheat, rice, oat, sorghum,rapeseed, rye, tritordeum, millet, fescue, perennial ryegrass,sugarcane, cranberry, papaya, banana, safflower, oil palms, flax,muskmelon, apple, cucumber, dendrobium, gladiolus, chrysanthemum,liliacea, cotton, eucalyptus, sunflower, Brassica campestris, Brassicanapus, oilseed rape, turfgrass, sugarbeet, coffee and dioscorea(Christou, In: Particle Bombardment for Genetic Engineering of Plants,Biotechnology Intelligence Unit. Academic Press, San Diego, Calif.(1996)), with canola, corn, Brassica campestris, Brassica napus, oilseedrape, rapeseed, soybean, crambe, mustard, castor bean, peanut, sesame,cottonseed, linseed, safflower, oil palm, flax, and sunflower preferred,and canola, rapeseed, corn, Brassica campestris, Brassica napus, oilseedrape, soybean, sunflower, safflower, oil palms, and peanut preferred. Ina more preferred embodiment, the genetic material is transferred intocanola. In another more preferred embodiment, the genetic material istransferred into oilseed rape. In another particularly preferredembodiment, the genetic material is transferred into soybean.

Transfer of a nucleic acid molecule that encodes a protein can result inexpression or overexpression of that polypeptide in a transformed cellor transgenic plant. One or more of the proteins or fragments thereofencoded by nucleic acid molecules of the present invention may beoverexpressed in a transformed cell or transformed plant. Suchexpression or overexpression may be the result of transient or stabletransfer of the exogenous genetic material.

In a preferred embodiment, expression or overexpression of a polypeptideof the present invention in a plant provides in that plant, relative toan untransformed plant with a similar genetic background, an increasedlevel of tocopherols.

In a preferred embodiment, expression or overexpression of a polypeptideof the present invention in a plant provides in that plant, relative toan untransformed plant with a similar genetic background, an increasedlevel of α-tocopherols.

In a preferred embodiment, expression or overexpression of a polypeptideof the present invention in a plant provides in that plant, relative toan untransformed plant with a similar genetic background, an increasedlevel of γ-tocopherols.

In a preferred embodiment, expression or overexpression of a polypeptideof the present invention in a plant provides in that plant, relative toan untransformed plant with a similar genetic background, an increasedlevel of δ-tocopherols.

In a preferred embodiment, expression or overexpression of a polypeptideof the present invention in a plant provides in that plant, relative toan untransformed plant with a similar genetic background, an increasedlevel of β-tocopherols.

In a preferred embodiment, expression or overexpression of a polypeptideof the present invention in a plant provides in that plant, relative toan untransformed plant with a similar genetic background, an increasedlevel of tocotrienols.

In a preferred embodiment, expression or overexpression of a polypeptideof the present invention in a plant provides in that plant, relative toan untransformed plant with a similar genetic background, an increasedlevel of α-tocotrienols.

In a preferred embodiment, expression or overexpression of a polypeptideof the present invention in a plant provides in that plant, relative toan untransformed plant with a similar genetic background, an increasedlevel of γ-tocotrienols.

In a preferred embodiment, expression or overexpression of a polypeptideof the present invention in a plant provides in that plant, relative toan untransformed plant with a similar genetic background, an increasedlevel of δ-tocotrienols.

In a preferred embodiment, expression or overexpression of a polypeptideof the present invention in a plant provides in that plant, relative toan untransformed plant with a similar genetic background, an increasedlevel of β-tocotrienols.

In a preferred embodiment, expression or overexpression of a polypeptideof the present invention in a plant provides in that plant, relative toan untransformed plant with a similar genetic background, an increasedlevel of plastoquinols.

In any of the embodiments described herein, an increase in γ-tocopherol,α-tocopherol, or both can lead to a decrease in the relative proportionof β-tocopherol, δ-tocopherol, or both. Similarly, an increase inγ-tocotrienol, α-tocotrienol, or both can lead to a decrease in therelative proportion of β-tocotrienol, δ-tocotrienol, or both.

In another embodiment, expression overexpression of a polypeptide of thepresent invention in a plant provides in that plant, or a tissue of thatplant, relative to an untransformed plant or plant tissue, with asimilar genetic background, an increased level of a homogentisate prenyltransferase protein or fragment thereof.

In some embodiments, the levels of one or more products of thetocopherol biosynthesis pathway, including any one or more oftocopherols, α-tocopherols, γ-tocopherols, δ-tocopherols, β-tocopherols,tocotrienols, α-tocotrienols, γ-tocotrienols, δ-tocotrienols,β-tocotrienols are increased by greater than about 10%, or morepreferably greater than about 25%, 35%, 50%, 75%, 80%, 90%, 100%, 150%,200%, 1,000%, 2,000%, or 2,500%. The levels of products may be increasedthroughout an organism such as a plant or localized in one or morespecific organs or tissues of the organism. For example, the levels ofproducts may be increased in one or more of the tissues and organs of aplant including without limitation: roots, tubers, stems, leaves,stalks, fruit, berries, nuts, bark, pods, seeds and flowers. A preferredorgan is a seed.

In some embodiments, the levels of one or more products of thetocopherol biosynthesis pathway, including any one or more oftocopherols, α-tocopherols, γ-tocopherols, δ-tocopherols, β-tocopherols,tocotrienols, α-tocotrienols, γ-tocotrienols, δ-tocotrienols,β-tocotrienols are increased so that they constitute greater than about10%, 20%, 25%, 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%,85%, 90%, 95%, 96%, 97%, 98%, or 99% of the total tocopherol content ofthe organism or tissue. The levels of products may be increasedthroughout an organism such as a plant or localized in one or morespecific organs or tissues of the organism. For example, the levels ofproducts may be increased in one or more of the tissues and organs of aplant including without limitation: roots, tubers, stems, leaves,stalks, fruit, berries, nuts, bark, pods, seeds and flowers. A preferredorgan is a seed.

In a preferred embodiment, expression of enzymes involved in tocopherol,tocotrienol or plastoquinol synthesis in the seed will result in anincrease in γ-tocopherol levels due to the absence of significant levelsof GMT activity in those tissues. In another preferred embodiment,expression of enzymes involved in tocopherol, tocotrienol, orplastoquinol synthesis in photosyhthetic tissues will result in anincrease in α-tocopherol due to the higher levels of GMT activity inthose tissues relative to the same activity in seed tissue.

In another preferred embodiment, the expression of enzymes involved intocopherol, tocotrienol, or plastoquinol synthesis in the seed willresult in an increase in the total tocopherol, tocotrienol, orplastoquinol level in the plant.

In some embodiments, the levels of tocopherols or a species such asα-tocopherol may be altered. In some embodiments, the levels oftocotrienols may be altered. Such alteration can be compared to a plantwith a similar background.

In another embodiment, either the α-tocopherol level, α-tocotrienollevel, or both of plants that natively produce high levels of eitherα-tocopherol, α-tocotrienol or both (e.g., sunflowers), can be increasedby the introduction of a gene coding for a homogentisate prenyltransferase enzyme.

In a preferred aspect, a similar genetic background is a backgroundwhere the organisms being compared share about 50% or greater of theirnuclear genetic material. In a more preferred aspect a similar geneticbackground is a background where the organisms being compared shareabout 75% or greater, even more preferably about 90% or greater of theirnuclear genetic material. In another even more preferable aspect, asimilar genetic background is a background where the organisms beingcompared are plants, and the plants are isogenic except for any geneticmaterial originally introduced using plant transformation techniques.

In another preferred embodiment, expression or overexpression of apolypeptide of the present invention in a transformed plant may providetolerance to a variety of stress, e.g. oxidative stress tolerance suchas to oxygen or ozone, UV tolerance, cold tolerance, or fungal/microbialpathogen tolerance.

As used herein in a preferred aspect, a tolerance or resistance tostress is determined by the ability of a plant, when challenged by astress such as cold to produce a plant having a higher yield than onewithout such tolerance or resistance to stress. In a particularlypreferred aspect of the present invention, the tolerance or resistanceto stress is measured relative to a plant with a similar geneticbackground to the tolerant or resistance plant except that the plantreduces the expression, expresses, or over expresses a protein orfragment thereof of the present invention.

Exogenous genetic material may be transferred into a host cell by theuse of a DNA vector or construct designed for such a purpose. Design ofsuch a vector is generally within the skill of the art (see, PlantMolecular Biology: A Laboratory Manual, Clark (ed.), Springer, NY(1997)).

A construct or vector may include a plant promoter to express thepolypeptide of choice. In a preferred embodiment, any nucleic acidmolecules described herein can be operably linked to a promoter regionwhich functions in a plant cell to cause the production of an mRNAmolecule. For example, any promoter that functions in a plant cell tocause the production of an mRNA molecule, such as those promotersdescribed herein, without limitation, can be used. In a preferredembodiment, the promoter is a plant promoter.

A number of promoters that are active in plant cells have been describedin the literature. These include the nopaline synthase (NOS) promoter(Ebert et al., Proc. Natl. Acad. Sci. (U.S.A.), 84:5745-5749 (1987)),the octopine synthase (OCS) promoter (which is carried on tumor-inducingplasmids of Agrobacterium tumefaciens), the caulimovirus promoters suchas the cauliflower mosaic virus (CaMV) 19S promoter (Lawton et al.,Plant Mol. Biol., 9:315-324 (1987)) and the CaMV 35S promoter (Odell etal., Nature, 313:810-812 (1985)), the figwort mosaic virus 35S-promoter,the light-inducible promoter from the small subunit ofribulose-1,5-bis-phosphate carboxylase (ssRUBISCO), the Adh promoter(Walker et al., Proc. Natl. Acad. Sci. (U.S.A.), 84:6624-6628 (1987)),the sucrose synthase promoter (Yang et al., Proc. Natl. Acad. Sci.(U.S.A.), 87:4144-4148 (1990)), the R gene complex promoter (Chandler etal., The Plant Cell, 1:1175-1183 (1989)) and the chlorophyll a/b bindingprotein gene promoter, etc. These promoters have been used to create DNAconstructs that have been expressed in plants; see, e.g., WO 84/02913.The CaMV 35S promoters are preferred for use in plants. Promoters knownor found to cause transcription of DNA in plant cells can be used in thepresent invention.

For the purpose of expression in source tissues of the plant, such asthe leaf, seed, root or stem, it is preferred that the promotersutilized have relatively high expression in these specific tissues.Tissue-specific expression of a protein of the present invention is aparticularly preferred embodiment. For this purpose, one may choose froma number of promoters for genes with tissue- or cell-specific orenhanced expression. Examples of such promoters reported in theliterature include the chloroplast glutamine synthetase GS2 promoterfrom pea (Edwards et al., Proc. Natl. Acad. Sci. (U.S.A.), 87:3459-3463(1990)), the chloroplast fructose-1,6-biphosphatase (FBPase) promoterfrom wheat (Lloyd et al., Mol. Gen. Genet., 225:209-216 (1991)), thenuclear photosynthetic ST-LS1 promoter from potato (Stockhaus et al.,EMBO J., 8:2445-2451 (1989)), the serine/threonine kinase (PAL) promoterand the glucoamylase (CHS) promoter from Arabidopsis thaliana. Also,reported to be active in photosynthetically active tissues are theribulose-1,5-bisphosphate carboxylase (RbcS) promoter from eastern larch(Larix laricina), the promoter for the cab gene, cab6, from pine(Yamamoto et al., Plant Cell Physiol., 35:773-778 (1994)), the promoterfor the Cab-1 gene from wheat (Fejes et al., Plant Mol. Biol.,15:921-932 (199 0)), the promoter for the CAB-1 gene from spinach(Lubberstedt et al., Plant Physiol., 104:997-1006 (1994)), the promoterfor the cab1R gene from rice (Luan et al., Plant Cell., 4:971-981(1992)), the pyruvate, orthophosphate dikinase (PPDK) promoter from corn(Matsuoka et al., Proc. Natl. Acad. Sci. (U.S.A.), 90:9586-9590 (1993)),the promoter for the tobacco Lhcb1*2 gene (Cerdan et al., Plant Mol.Biol., 33:245-255 (1997)), the Arabidopsis thaliana SUC2 sucrose-H+symporter promoter (Truernit et al., Planta., 196:564-570 (1995)) andthe promoter for the thylakoid membrane proteins from spinach (psaD,psaF, psaE, PC, FNR, atpC, atpD, cab, rbcS). Other promoters for thechlorophyll a/b-binding proteins may also be utilized in the presentinvention, such as the promoters for LhcB gene and PsbP gene from whitemustard (Sinapis alba; Kretsch et al., Plant Mol. Biol., 28:219-229(1995)).

For the purpose of expression in sink tissues of the plant, such as thetuber of the potato plant, the fruit of tomato, or the seed of corn,wheat, rice and barley, it is preferred that the promoters utilized inthe present invention have relatively high expression in these specifictissues. A number of promoters for genes with. tuber-specific ortuber-enhanced expression are known, including the class I patatinpromoter (Bevan et al., EMBO J., 8:1899-1906 (1986); Jefferson et al.,Plant Mol. Biol., 14:995-1006 (1990)), the promoter for the potato tuberADPGPP genes, both the large and small subunits, the sucrose synthasepromoter (Salanoubat and Belliard, Gene, 60:47-56 (1987), Salanoubat andBelliard, Gene, 84:181-185 (1989)), the promoter for the major tuberproteins including the 22 kd protein complexes and protease inhibitors(Hannapel, Plant Physiol., 101:703-704 (1993)), the promoter for thegranule-bound starch synthase gene (GBSS) (Visser et al., Plant Mol.Biol., 17:691-699 (1991)) and other class I and II patatins promoters(Koster-Topfer et al., Mol. Gen. Genet., 219:390-396 (1989); Mignery etal., Gene., 62:2744 (1988)).

Other promoters can also be used to express a polypeptide in specifictissues, such as seeds or fruits. Indeed, in a preferred embodiment, thepromoter used is a seed specific promoter. Examples of such promotersinclude the 5′ regulatory regions from such genes as napin (Kridl etal., Seed Sci. Res., 1:209:219 (1991)), phaseolin (Bustos et al., PlantCell, 1(9):839-853 (1989)), soybean trypsin inhibitor (Riggs et al.,Plant Cell, 1(6):609-621 (1989)), ACP (Baerson et al., Plant Mol. Biol.,22(2):255-267 (1993)), stearoyl-ACP desaturase (Slocombe et al., PlantPhysiol., 104(4):167-176 (1994)), soybean α′ subunit of β-conglycinin(soy 7s, (Chen et al., Proc. Natl. Acad. Sci., 83:8560-8564 (1986))),and oleosin (see, for example, Hong et al., Plant Mol. Biol.,34(3):549-555 (1997)). Further examples include the promoter forβ-conglycinin (Chen et al., Dev. Genet., 10:112-122 (1989)). Alsoincluded are the zeins, which are a group of storage proteins found incorn endosperm. Genomic clones for zein genes have been isolated(Pedersen et al., Cell, 29:1015-1026 (1982), and Russell et al.,Transgenic Res., 6(2):157-168) and the promoters from these clones,including the 15 kD, 16 kD, 19 kD, 22 kD, 27 kD and genes, could also beused. Other promoters known to function, for example, in corn includethe promoters for the following genes: waxy, Brittle, Shrunken 2,Branching enzymes I and II, starch synthases, debranching enzymes,oleosins, glutelins and sucrose synthases. A particularly preferredpromoter for corn endosperm expression is the promoter for the glutelingene from rice, more particularly the Osgt-1 promoter (Zheng et al.,Mol. Cell Biol., 13:5829-5842 (1993)). Examples of promoters suitablefor expression in wheat include those promoters for the ADPglucosepyrosynthase (ADPGPP) subunits, the granule bound and other starchsynthase, the branching and debranching enzymes, theembryogenesis-abundant proteins, the gliadins and the glutenins.Examples of such promoters in rice include those promoters for theADPGPP subunits, the granule bound and other starch synthase, thebranching enzymes, the debranching enzymes, sucrose synthases and theglutelins. A particularly preferred promoter is the promoter for riceglutelin, Osgt-1. Examples of such promoters for barley include thosefor the ADPGPP subunits, the granule bound and other starch synthase,the branching enzymes, the debranching enzymes, sucrose synthases, thehordeins, the embryo globulins and the aleurone specific proteins. Apreferred promoter for expression in the seed is a napin promoter.Another preferred promoter for expression is an Arcelin 5 promoter.

Root specific promoters may also be used. An example of such a promoteris the promoter for the acid chitinase gene (Samac et al., Plant Mol.Biol., 25:587-596 (1994)). Expression in root tissue could also beaccomplished by utilizing the root specific subdomains of the CaMV35Spromoter that have been identified (Lam et al., Proc. Natl. Acad. Sci.(U.S.A.), 86:7890-7894 (1989)). Other root cell specific promotersinclude those reported by Conkling et al., Plant Physiol., 93:1203-1211(1990).

Other preferred promoters include 7α′ (Beachy et al., EMBO J., 4:3047(1985); Schuler et al., Nucleic Acid Res., 10(24):8225-8244 (1982)); USP88 and enhanced USP 88 (U.S. Patent Application No. 60/377,236, filedMay 3, 2002, incorporated herein by reference); and 7Sα, (U.S. patentapplication Ser. No. 10/235,618).

Additional promoters that may be utilized are described, for example, inU.S. Pat. Nos. 5,378,619; 5,391,725; 5,428,147; 5,447,858; 5,608,144;5,608,144; 5,614,399; 5,633,441; 5,633,435; and 4,633,436. In addition,a tissue specific enhancer may be used (Fromm et al., The Plant Cell,1:977-984 (1989)).

Constructs or vectors may also include, with the coding region ofinterest, a nucleic acid sequence that acts, in whole or in part, toterminate transcription of that region. A number of such sequences havebeen isolated, including the Tr7 3′ sequence and the NOS 3′ sequence(Ingelbrecht et al., The Plant Cell, 1:671-680 (1989); Bevan et al.,Nucleic Acids Res., 11:369-385 (1983)). Regulatory transcripttermination regions can be provided in plant expression constructs ofthis present invention as well. Transcript termination regions can beprovided by the DNA sequence encoding the gene of interest or aconvenient transcription termination region derived from a differentgene source, for example, the transcript termination region that isnaturally associated with the transcript initiation region. The skilledartisan will recognize that any convenient transcript termination regionthat is capable of terminating transcription in a plant cell can beemployed in the constructs of the present invention.

A vector or construct may also include regulatory elements. Examples ofsuch include the Adh intron 1 (Callis et al., Genes and Develop.,1:1183-1200 (1987)), the sucrose synthase intron (Vasil et al., PlantPhysiol., 91:1575-1579 (1989)) and the TMV omega element (Gallie et al.,The Plant Cell, 1:301-311 (1989)). These and other regulatory elementsmay be included when appropriate.

A vector or construct may also include a selectable marker. Selectablemarkers may also be used to select for plants or plant cells thatcontain the exogenous genetic material. Examples of such include, butare not limited to: a neo gene (Potrykus et al., Mol. Gen. Genet.,199:183-188 (1985)), which codes for kanamycin resistance and can beselected for using kanamycin, RptII, G418, hpt etc.; a bar gene whichcodes for bialaphos resistance; a mutant EPSP synthase gene (Hinchee etal., Bio/Technology, 6:915-922 (1988); Reynaerts et al., Selectable andScreenable Markers. In: Gelvin and Schilperoort, Plant Molecular BiologyManual, Kluwer, Dordrecht (1988); Reynaerts et al., Selectable andScreenable Markers. In: Gelvin and Schilperoort, Plant Molecular BiologyManual, Kluwer, Dordrecht (1988)), aadA (Jones et al., Mol. Gen. Genet.(1987)), which encodes glyphosate resistance; a nitrilase gene whichconfers resistance to bromoxynil (Stalker et al., J. Biol. Chem.,263:6310-6314 (1988)); a mutant acetolactate synthase gene (ALS) whichconfers imidazolinone or sulphonylurea resistance (EP 0 154 204 (Sep.11, 1985)), ALS (D'Halluin et al., Bio/technology, 10:309-314 (1992)),and a methotrexate resistant DHFR gene (Thillet et al., J. Biol. Chem.,263:12500-12508 (1988)).

A vector or construct may also include a transit peptide. Incorporationof a suitable chloroplast transit peptide may also be employed (EP 0 218571). Translational enhancers may also be incorporated as part of thevector DNA. DNA constructs could contain one or more 5′ non-translatedleader sequences, which may serve to enhance expression of the geneproducts from the resulting mRNA transcripts. Such sequences may bederived from the promoter selected to express the gene or can bespecifically modified to increase translation of the mRNA. Such regionsmay also be obtained from viral RNAs, from suitable eukaryotic genes, orfrom a synthetic gene sequence. For a review of optimizing expression oftransgenes, see Koziel et al., Plant Mol. Biol., 32:393-405 (1996). Apreferred transit peptide is CTP1.

A vector or construct may also include a screenable marker. Screenablemarkers may be used to monitor expression. Exemplary screenable markersinclude: a β-glucuronidase or uidA gene (GUS) which encodes an enzymefor which various chromogenic substrates are known (Jefferson, PlantMol. Biol, Rep., 5:387-405 (1987); Jefferson et al., EMBO J.,6:3901-3907 (1987)); an R-locus gene, which encodes a product thatregulates the production of anthocyanin pigments (red color) in planttissues (Dellaporta et al., Stadler Symposium, 11:263-282 (1988)); aβ-lactamase gene (Sutcliffe et al., Proc. Natl. Acad. Sci. (U.S.A.),75:3737-3741 (1978)), a gene which encodes an enzyme for which variouschromogenic substrates are known (e.g., PADAC, a chromogeniccephalosporin); a luciferase gene (Ow et al., Science, 234:856-859(1986)); a xylE gene (Zukowsky et al., Proc. Natl. Acad. Sci. (U.S.A.),80:1101-1105 (1983)) which encodes a catechol dioxygenase that canconvert chromogenic catechols; an α-amylase gene (Ikatu et al.,Bio/Technol., 8:241-242 (1990)); a tyrosinase gene (Katz et al., J. Gen.Microbiol., 129:2703-2714 (1983)) which encodes an enzyme capable ofoxidizing tyrosine to DOPA and dopaquinone which in turn condenses tomelanin; an α-galactosidase, which will turn a chromogenic α-galactosesubstrate.

Included within the terms “selectable or screenable marker genes” arealso genes that encode a secretable marker whose secretion can bedetected as a means of identifying or selecting for transformed cells.Examples include markers that encode a secretable antigen that can beidentified by antibody interaction, or even secretable enzymes that canbe detected catalytically. Secretable proteins fall into a number ofclasses, including small, diffusible proteins that are detectable,(e.g., by ELISA), small active enzymes that are detectable inextracellular solution (e.g., α-amylase, β-lactamase, phosphinothricintransferase), or proteins that are inserted or trapped in the cell wall(such as proteins that include a leader sequence such as that found inthe expression unit of extension or tobacco PR-S). Other possibleselectable and/or screenable marker genes will be apparent to those ofskill in the art.

There are many methods for introducing transforming nucleic acidmolecules into plant cells. Suitable methods are believed to includevirtually any method by which nucleic acid molecules may be introducedinto a cell, such as by Agrobacterium infection or direct delivery ofnucleic acid molecules such as, for example, by PEG-mediatedtransformation, by electroporation or by acceleration of DNA coatedparticles, and the like. (Potrykus, Ann. Rev. Plant Physiol. Plant Mol.Biol., 42:205-225 (1991); Vasil, Plant Mol. Biol., 25:925-937 (1994)).For example, electroporation has been used to transform corn protoplasts(Fromm et al., Nature, 312:791-793 (1986)).

Other vector systems suitable for introducing transforming DNA into ahost plant cell include but are not limited to binary artificialchromosome (BIBAC) vectors (Hamilton et al., Gene, 200:107-116 (1997));and transfection with RNA viral vectors (Della-Cioppa et al., Ann. N.Y.Acad. Sci. (1996), 792 (Engineering Plants for Commercial Products andApplications, 57-61). Additional vector systems also include plantselectable YAC vectors such as those described in Mullen et al.,Molecular Breeding, 4:449-457 (1988).

Technology for introduction of DNA into cells is well known to those ofskill in the art. Four general methods for delivering a gene into cellshave been described: (1) chemical methods (Graham and van der Eb,Virology, 54:536-539 (1973)); (2) physical methods such asmicroinjection (Capecchi, Cell, 22:479-488 (1980)), electroporation(Wong and Neumann, Biochem. Biophys. Res. Commun., 107:584-587 (1982);Fromm et al., Proc. Natl. Acad. Sci. (U.S.A.), 82:5824-5828 (1985); U.S.Pat. No. 5,384,253); the gene gun (Johnston and Tang, Methods CellBiol., 43:353-365 (1994)); and vacuum infiltration (Bechtold et al.,C.R. Acad. Sci. Paris, Life Sci., 316:1194-1199 (1993)); (3) viralvectors (Clapp, Clin. Perinatol., 20:155-168 (1993); Lu et al., J. Exp.Med., 178:2089-2096 (1993); Eglitis and Anderson, Biotechniques,6:608-614 (1988)); and (4) receptor-mediated mechanisms (Curiel et al.,Hum. Gen. Ther., 3:147-154 (1992), Wagner et al., Proc. Natl. Acad. Sci.(U.S.A.), 89:6099-6103 (1992)).

Acceleration methods that may be used include, for example,microprojectile bombardment and the like. One example of a method fordelivering transforming nucleic acid molecules into plant cells ismicroprojectile bombardment. This method has been reviewed by Yang andChristou (eds.), Particle Bombardment Technology for Gene Transfer,Oxford Press, Oxford, England (1994). Non-biological particles(microprojectiles) may be coated with nucleic acids and delivered intocells by a propelling force. Exemplary particles include those comprisedof tungsten, gold, platinum and the like.

A particular advantage of microprojectile bombardment, in addition to itbeing an effective means of reproducibly transforming monocots, is thatneither the isolation of protoplasts (Cristou et al., Plant Physiol.,87:671-674 (1988)) nor the susceptibility to Agrobacterium infection isrequired. An illustrative embodiment of a method for delivering DNA intocorn cells by acceleration is a biolistics α-particle delivery system,which can be used to propel particles coated with DNA through a screen,such as a stainless steel or Nytex screen, onto a filter surface coveredwith corn cells cultured in suspension. Gordon-Kamm et al., describesthe basic procedure for coating tungsten particles with DNA (Gordon-Kammet al., Plant Cell, 2:603-618 (1990)). The screen disperses the tungstennucleic acid particles so that they are not delivered to the recipientcells in large aggregates. A particle delivery system suitable for usewith the present invention is the helium acceleration PDS-1000/He gun,which is available from Bio-Rad Laboratories (Bio-Rad, Hercules, Calif.)(Sanford et al., Technique, 3:3-16 (1991)).

For the bombardment, cells in suspension may be concentrated on filters.Filters containing the cells to be bombarded are positioned at anappropriate distance below the microprojectile stopping plate. Ifdesired, one or more screens are also positioned between the gun and thecells to be bombarded.

Alternatively, immature embryos or other target cells may be arranged onsolid culture medium. The cells to be bombarded are positioned at anappropriate distance below the microprojectile stopping plate. Ifdesired, one or more screens are also positioned between theacceleration device and the cells to be bombarded. Through the use oftechniques set forth herein one may obtain 1000 or more loci of cellstransiently expressing a marker gene. The number of cells in a focusthat express the exogenous gene product 48 hours post-bombardment oftenranges from one to ten, and average one to three.

In bombardment transformation, one may optimize the pre-bombardmentculturing conditions and the bombardment parameters to yield the maximumnumbers of stable transformants. Both the physical and biologicalparameters for bombardment are important in this technology. Physicalfactors are those that involve manipulating the DNA/microprojectileprecipitate or those that affect the flight and velocity of either themacro- or microprojectiles. Biological factors include all stepsinvolved in manipulation of cells before and immediately afterbombardment, the osmotic adjustment of target cells to help alleviatethe trauma associated with bombardment and also the nature of thetransforming DNA, such as linearized DNA or intact supercoiled plasmids.It is believed that pre-bombardment manipulations are especiallyimportant for successful transformation of immature embryos.

In another alternative embodiment, plastids can be stably transformed.Methods disclosed for plastid transformation in higher plants includethe particle gun delivery of DNA containing a selectable marker andtargeting of the DNA to the plastid genome through homologousrecombination (Svab et al., Proc. Natl. Acad. Sci. (U.S.A.),87:8526-8530 (1990); Svab and Maliga, Proc. Natl. Acad. Sci. (U.S.A.),90:913-917 (1993); Staub and Maliga, EMBO J., 12:601-606 (1993); U.S.Pat. Nos. 5,451,513 and 5,545,818).

Accordingly, it is contemplated that one may wish to adjust variousaspects of the bombardment parameters in small scale studies to fullyoptimize the conditions. One may particularly wish to adjust physicalparameters such as gap distance, flight distance, tissue distance andhelium pressure. One may also minimize the trauma reduction factors bymodifying conditions that influence the physiological state of therecipient cells and which may therefore influence transformation andintegration efficiencies. For example, the osmotic state, tissuehydration and the subculture stage or cell cycle of the recipient cellsmay be adjusted for optimum transformation. The execution of otherroutine adjustments will be known to those of skill in the art in lightof the present disclosure.

Agrobacterium-mediated transfer is a widely applicable system forintroducing genes into plant cells because the DNA-can be introducedinto whole plant tissues, thereby bypassing the need for regeneration ofan intact plant from a protoplast. The use of Agrobacterium-mediatedplant integrating vectors to introduce DNA into plant cells is wellknown in the art. See, for example, the methods described by Fraley etal., Bio/Technology, 3:629-635 (1985) and Rogers et al., MethodsEnzymol., 153:253-277 (1987). Further, the integration of the Ti-DNA isa relatively precise process resulting in few rearrangements. The regionof DNA to be transferred is defined by the border sequences andintervening DNA is usually inserted into the plant genome as described(Spielmann et al., Mol. Gen. Genet., 205:34 (1986)).

Modern Agrobacterium transformation vectors are capable of replicationin E. coli as well as Agrobacterium, allowing for convenientmanipulations as described (Klee et al., In: Plant DNA InfectiousAgents, Hohn and Schell (eds.), Springer-Verlag, NY, pp. 179-203(1985)). Moreover, technological advances in vectors forAgrobacterium-mediated gene transfer have improved the arrangement ofgenes and restriction sites in the vectors to facilitate construction ofvectors capable of expressing various polypeptide coding genes. Thevectors described have convenient multi-linker regions flanked by apromoter and a polyadenylation site for direct expression of insertedpolypeptide coding genes and are suitable for present purposes (Rogerset al., Methods Enzymol., 153:253-277 (1987)). In addition,Agrobacterium containing both armed and disarmed Ti genes can be usedfor the transformations. In those plant strains whereAgrobacterium-mediated transformation is efficient, it is the method ofchoice because of the facile and defined nature of the gene transfer.

A transgenic plant formed using Agrobacterium transformation methodstypically contains a single gene on one chromosome. Such transgenicplants can be referred to as being heterozygous for the added gene. Morepreferred is a transgenic plant that is homozygous for the addedstructural gene; i.e., a transgenic plant that contains two added genes,one gene at the same locus on each chromosome of a chromosome pair. Ahomozygous transgenic plant can be obtained by sexually mating (selfing)an independent segregant, transgenic plant that contains a single addedgene, germinating some of the seed produced and analyzing the resultingplants produced for the gene of interest.

It is also to be understood that two different transgenic plants canalso be mated to produce offspring that contain two independentlysegregating, exogenous genes. Selfing of appropriate progeny can produceplants that are homozygous for both added, exogenous genes that encode apolypeptide of interest. Back-crossing to a parental plant andout-crossing with a non-transgenic plant are also contemplated, as isvegetative propagation.

Transformation of plant protoplasts can be achieved using methods basedon calcium phosphate precipitation, polyethylene glycol treatment,electroporation and combinations of these treatments (see, for example,Potrykus et al., Mol. Gen. Genet., 205:193-200 (1986); Lorz et al., Mol.Gen. Genet., 199:178 (1985); Fromm et al., Nature, 319:791 (1986);Uchimiya et al., Mol. Gen. Genet., 204:204 (1986); Marcotte et al.,Nature, 335:454-457 (1988)).

Application of these systems to different plant strains depends upon theability to regenerate that particular plant strain from protoplasts.Illustrative methods for the regeneration of cereals from protoplastsare described (Fujimura et al., Plant Tissue Culture Letters, 2:74(1985); Toriyama et al., Theor. Appl. Genet., 205:34 (1986); Yamada etal., Plant Cell Rep., 4:85 (1986); Abdullah et al., Biotechnology,4:1087 (1986)).

To transform plant strains that cannot be successfully regenerated fromprotoplasts, other ways to introduce DNA into intact cells or tissuescan be utilized. For example, regeneration of cereals from immatureembryos or explants can be effected as described (Vasil, Biotechnology,6:397 (1988)). In addition, “particle gun” or high-velocitymicroprojectile technology can be utilized (Vasil et al.,Bio/Technology, 10:667 (1992)).

Using the latter technology, DNA is carried through the cell wall andinto the cytoplasm on the surface of small metal particles as described(Klein et al., Nature, 328:70 (1987); Klein et al., Proc. Natl. Acad.Sci. (U.S.A.), 85:8502-8505 (1988); McCabe et al., Bio/Technology, 6:923(1988)). The metal particles penetrate through several layers of cellsand thus allow the transformation of cells within tissue explants.

Other methods of cell transformation can also be used and include butare not limited to introduction of DNA into plants by direct DNAtransfer into pollen (Hess et al., Intern Rev. Cytol., 107:367 (1987);Luo et al., Plant Mol Biol. Reporter, 6:165 (1988)), by direct injectionof DNA into reproductive organs of a plant (Pena et al., Nature, 325:274(1987)), or by direct injection of DNA into the cells of immatureembryos followed by the rehydration of desiccated embryos (Neuhaus etal., Theor. Appl. Genet., 75:30 (1987)).

The regeneration, development and cultivation of plants from singleplant protoplast transformants or from various transformed explants iswell known in the art (Weissbach and Weissbach, In: Methods for PlantMolecular Biology, Academic Press, San Diego, Calif., (1988)). Thisregeneration and growth process typically includes the steps ofselection of transformed cells, culturing those individualized cellsthrough the usual stages of embryonic development through the rootedplantlet stage. Transgenic embryos and seeds are similarly regenerated.The resulting transgenic rooted shoots are thereafter planted in anappropriate plant growth medium such as soil.

The development or regeneration of plants containing the foreign,exogenous gene that encodes a protein of interest is well known in theart. Preferably, the regenerated plants are self-pollinated to providehomozygous transgenic plants. Otherwise, pollen obtained from theregenerated plants is crossed to seed-grown plants of agronomicallyimportant lines. Conversely, pollen from plants of these important linesis used to pollinate regenerated plants. A transgenic plant of thepresent invention containing a desired polypeptide is cultivated usingmethods well known to one skilled in the art.

There are a variety of methods for the regeneration of plants from planttissue. The particular method of regeneration will depend on thestarting plant tissue and the particular plant species to beregenerated.

Methods for transforming dicots, primarily by use of Agrobacteriumtumefaciens and obtaining transgenic plants have been published forcotton (U.S. Pat. Nos. 5,004,863; 5,159,135; and 5,518,908); soybean(U.S. Pat. Nos. 5,569,834 and 5,416,011; McCabe et al., Biotechnology,6:923 (1988); Christou et al., Plant Physiol., 87:671-674 (1988));Brassica (U.S. Pat. No. 5,463,174); peanut (Cheng et al., Plant CellRep., 15:653-657 (1996), McKently et al., Plant Cell Rep., 14:699-703(1995)); papaya; pea (Grant et al., Plant Cell Rep., 15:254-258 (1995));and Arabidopsis thaliana (Bechtold et al., C.R. Acad. Sci. Paris, LifeSci., 316:1194-1199 (1993)). The latter method for transformingArabidopsis thaliana is commonly called “dipping” or vacuum infiltrationor germplasm transformation.

Transformation of monocotyledons using electroporation, particlebombardment and Agrobacterium have also been reported. Transformationand plant regeneration have been achieved in asparagus (Bytebier et al.,Proc. Natl. Acad. Sci. (U.S.A.), 84:5354 (1987)); barley (Wan andLemaux, Plant Physiol, 104:37 (1994)); corn (Rhodes et al., Science,240:204 (1988); Gordon-Kamm et al., Plant Cell, 2:603-618 (1990); Frommet al., Bio/Technology, 8:833 (1990); Koziel et al., Bio/Technology,11:194 (1993); Armstrong et al., Crop Science, 35:550-557 (1995)); oat(Somers et al., Bio/Technology, 10:1589 (1992)); orchard grass (Horn etal., Plant Cell Rep., 7:469 (1988)); rice (Toriyama et al., Theor Appl.Genet., 205:34 (1986); Part et al., Plant Mol. Biol., 32:1135-1148(1996); Abedinia et al., Aust. J. Plant Physiol., 24:133-141 (1997);Zhang and Wu, Theor. Appl. Genet., 76:835 (1988); Zhang et al., PlantCell Rep., 7:379 (1988); Battraw and Hall, Plant Sci., 86:191-202(1992); Christou et al., Bio/Technology, 9:957 (1991)); rye (De la Penaet al., Nature, 325:274 (1987)); sugarcane (Bower and Birch, Plant J.,2:409 (1992)); tall fescue (Wang et al., Bio/Technology, 10:691 (1992));and wheat (Vasil et al., Bio/Technology, 10:667 (1992); U.S. Pat. No.5,631,152).

Assays for gene expression based on the transient expression of clonednucleic acid constructs have been developed by introducing the nucleicacid molecules into plant cells by polyethylene glycol treatment,electroporation, or particle bombardment (Marcotte et al., Nature,335:454-457 (1988); Marcotte et al., Plant Cell, 1:523-532 (1989);McCarty et al., Cell, 66:895-905 (1991); Hattori et al., Genes Dev.,6:609-618 (1992); Goff et al., EMBO J., 9:2517-2522 (1990)). Transientexpression systems may be used to functionally dissect gene constructs(see generally, Mailga et al., Methods in Plant Molecular Biology, ColdSpring Harbor Press, NY (1995)).

Any of the nucleic acid molecules of the present invention may beintroduced into a plant cell in a permanent or transient manner incombination with other genetic elements such as vectors, promoters,enhancers, etc. Further, any of the nucleic acid molecules of thepresent invention may be introduced into a plant cell in a manner thatallows for expression or overexpression of the protein or fragmentthereof encoded by the nucleic acid molecule.

Cosuppression is the reduction in expression levels, usually at thelevel of RNA, of a particular endogenous gene or gene family by theexpression of a homologous sense construct that is capable oftranscribing mRNA of the same strandedness as the transcript of theendogenous gene (Napoli et al., Plant Cell, 2:279-289 (1990); van derKrol et al., Plant Cell, 2:291-299 (1990)). Cosuppression may resultfrom stable transformation with a single copy nucleic acid molecule thatis homologous to a nucleic acid sequence found with the cell (Prolls andMeyer, Plant J., 2:465-475 (1992)) or with multiple copies of a nucleicacid molecule that is homologous to a nucleic acid sequence found withthe cell (Mittlesten et al., Mol. Gen. Genet., 244:325-330 (1994)).Genes, even though different, linked to homologous promoters may resultin the cosuppression of the linked genes (Vaucheret, C.R. Acad. Sci.III, 316:1471-1483 (1993); Flavell, Proc. Natl. Acad. Sci. (U.S.A.),91:3490-3496 (1994)); van Blokland et al., Plant J., 6:861-877 (1994);Jorgensen, Trends Biotechnol., 8:340-344 (1990); Meins and Kunz, In:Gene Inactivation and Homologous Recombination in Plants, Paszkowski(ed.), pp. 335-348, Kluwer Academic, Netherlands (1994)).

It is understood that one or more of the nucleic acids of the presentinvention may be introduced into a plant cell and transcribed using anappropriate promoter with such transcription resulting in thecosuppression of an endogenous protein.

Antisense approaches are a way of preventing or reducing gene functionby targeting the genetic material (Mol et al., FEBS Lett., 268:427-430(1990)). The objective of the antisense approach is to use a sequencecomplementary to the target gene to block its expression and create amutant cell line or organism in which the level of a single chosenprotein is selectively reduced or abolished. Antisense techniques haveseveral advantages over other “reverse genetic” approaches. The site ofinactivation and its developmental effect can be manipulated by thechoice of promoter for antisense genes or by the timing of externalapplication or microinjection. Antisense can manipulate its specificityby selecting either unique regions of the target gene or regions whereit shares homology to other related genes (Hiatt et al., In: GeneticEngineering, Setlow (ed.), Vol. 11, New York: Plenum 49-63 (1989)).

Antisense RNA techniques involve introduction of RNA that iscomplementary to the target mRNA into cells, which results in specificRNA:RNA duplexes being formed by base pairing between the antisensesubstrate and the target mRNA (Green et al., Annu. Rev. Biochem.,55:569-597 (1986)). Under one embodiment, the process involves theintroduction and expression of an antisense gene sequence. Such asequence is one in which part or all of the normal gene sequences areplaced under a promoter in inverted orientation so that the “wrong” orcomplementary strand is transcribed into a noncoding antisense RNA thathybridizes with the target mRNA and interferes with its expression(Takayama and Inouye, Crit. Rev. Biochem. Mol. Biol., 25:155-184(1990)). An antisense vector is constructed by standard procedures andintroduced into cells by transformation, transfection, electroporation,microinjection, infection, etc. The type of transformation and choice ofvector will determine whether expression is transient or stable. Thepromoter used for the antisense gene may influence the level, timing,tissue, specificity, or inducibility of the antisense inhibition.

It is understood that the activity of a protein in a plant cell may bereduced or depressed by growing a transformed plant cell containing anucleic acid molecule whose non-transcribed strand encodes a protein orfragment thereof. A preferred protein whose activity can be reduced ordepressed, by any method, is a homogentisate prenyl transferase.

Posttranscriptional gene silencing (PTGS) can result in virus immunityor gene silencing in plants. PTGS is induced by dsRNA and is mediated byan RNA-dependent RNA polymerase, present in the cytoplasm, whichrequires a dsRNA template. The dsRNA is formed by hybridization ofcomplementary transgene mRNAs or complementary regions of the sametranscript. Duplex formation can be accomplished by using transcriptsfrom one sense gene and one antisense gene colocated in the plantgenome, a single transcript that has self-complementarity, or sense andantisense transcripts from genes brought together by crossing. ThedsRNA-dependent RNA polymerase makes a complementary strand from thetransgene mRNA and RNAse molecules attach to this complementary strand(cRNA). These cRNA-RNase molecules hybridize to the endogene mRNA andcleave the single-stranded RNA adjacent to the hybrid. The cleavedsingle-stranded RNAs are further degraded by other host RNases becauseone will lack a capped 5′ end and the other will lack a poly (A) tail(Waterhouse et al., PNAS, 95:13959-13964 (1998)).

It is understood that one or more of the nucleic acids of the presentinvention may be introduced into a plant cell and transcribed using anappropriate promoter with such transcription resulting in theposttranscriptional gene silencing of an endogenous transcript.

Antibodies have been expressed in plants (Hiatt et al., Nature,342:76-78 (1989); Conrad and Fielder, Plant Mol. Biol., 26:1023-1030(1994)). Cytoplasmic expression of a scFv (single-chain Fv antibody) hasbeen reported to delay infection by artichoke mottled crinkle virus.Transgenic plants that express antibodies directed against endogenousproteins may exhibit a physiological effect (Philips et al., EMBO J.,16:4489-4496 (1997); Marion-Poll, Trends in Plant Science, 2:447-448(1997)). For example, expressed anti-abscisic antibodies have beenreported to result in a general perturbation of seed development(Philips et al., EMBO J., 16:4489-4496 (1997)).

Antibodies that are catalytic may also be expressed in plants (abzymes).The principle behind abzymes is that since antibodies may be raisedagainst many molecules, this recognition ability can be directed towardgenerating antibodies that bind transition states to force a chemicalreaction forward (Persidas, Nature Biotechnology, 15:1313-1315 (1997);Baca et al., Ann. Rev. Biophys. Biomol. Struct., 26:461-493 (1997)). Thecatalytic abilities of abzymes may be enhanced by site directedmutagenesis. Examples of abzymes are, for example, set forth in U.S.Pat. Nos. 5,658,753; 5,632,990; 5,631,137; 5,602,015; 5,559,538;5,576,174; 5,500,358; 5,318,897; 5,298,409; 5,258,289; and 5,194,585.

It is understood that any of the antibodies of the present invention maybe expressed in plants and that such expression can result in aphysiological effect. It is also understood that any of the expressedantibodies may be catalytic.

The present invention also provides for parts of the plants,particularly reproductive or storage parts, of the present invention.Plant parts, without limitation, include seed, endosperm, ovule andpollen. In a particularly preferred embodiment of the present invention,the plant part is a seed. In one embodiment the seed is a constituent ofanimal feed.

In another embodiment, the plant part is a fruit, more preferably afruit with enhanced shelf life. In another preferred embodiment, thefruit has increased levels of a tocopherol. In another preferredembodiment, the fruit has increased levels of a tocotrienol.

The present invention also provides a container of over about 10,000,more preferably about 20,000, and even more preferably about 40,000seeds where over about 10%, more preferably 25%, more preferably 50%,and even more preferably 75% or 90% of the seeds are seeds derived froma plant of the present invention.

The present invention also provides a container of over about 10 kg,more preferably 25 kg, and even more preferably 50 kg seeds where overabout 10%, more preferably 25%, more preferably 50%, and even morepreferably 75% or 90% of the seeds are seeds derived from a plant of thepresent invention.

Any of the plants or parts thereof of the present invention may beprocessed to produce a feed, meal, protein, or oil preparation,including oil preparations high in total tocopherol content and oilpreparations high in any one or more of each tocopherol component listedherein. A particularly preferred plant part for-this purpose is a seed.In a preferred embodiment the feed, meal, protein or oil preparation isdesigned for livestock animals or humans, or both. Methods to producefeed, meal, protein and oil preparations are known in the art. See, forexample, U.S. Pat. Nos. 4,957,748; 5,100,679; 5,219,596; 5,936,069;6,005,076; 6,146,669; and 6,156,227. In a preferred embodiment, theprotein preparation is a high protein preparation. Such a high proteinpreparation preferably has a protein content of greater than about 5%w/v, more preferably 10% w/v, and even more preferably 15% w/v. In apreferred oil preparation, the oil preparation is a high oil preparationwith an oil content derived from a plant or part thereof of the presentinvention of greater than about 5% w/v, more preferably 10% w/v, andeven more preferably 15% w/v. In a preferred embodiment the oilpreparation is a liquid and of a volume greater than about 1, 5, 10, or50 liters. The present invention provides for oil produced from plantsof the present invention or generated by a method of the presentinvention. Such an oil may exhibit enhanced oxidative stability. Also,such oil may be a minor or major component of any resultant product.Moreover, such oil may be blended with other oils. In a preferredembodiment, the oil produced from plants of the present invention orgenerated by a method of the present invention constitutes greater thanabout 0.5%, 1%, 5%, 10%, 25%, 50%, 75%, or 90% by volume or weight ofthe oil component of any product. In another embodiment, the oilpreparation may be blended and can constitute greater than about 10%,25%, 35%, 50%, or 75% of the blend by volume. Oil produced from a plantof the present invention can be admixed with one or more organicsolvents or petroleum distillates.

Plants of the present invention can be part of or generated from abreeding program. The choice of breeding method depends on the mode ofplant reproduction, the heritability of the trait(s) being improved, andthe type of cultivar used commercially (e.g., F₁ hybrid cultivar,pureline cultivar, etc.). Selected, non-limiting approaches, forbreeding the plants of the present invention are set forth below. Abreeding program can be enhanced using marker assisted selection of theprogeny of any cross. It is further understood that any commercial andnon-commercial cultivars can be utilized in a breeding program. Factorssuch as, for example, emergence vigor, vegetative vigor, stresstolerance, disease resistance, branching, flowering, seed set, seedsize, seed density, standability, and threshability etc. will generallydictate the choice.

For highly heritable traits, a choice of superior individual plantsevaluated at a single location will be effective, whereas for traitswith low heritability, selection should be based on mean values obtainedfrom replicated evaluations of families of related plants. Popularselection methods commonly include pedigree selection, modified pedigreeselection, mass selection, and recurrent selection. In a preferredembodiment a backcross or recurrent breeding program is undertaken.

The complexity of inheritance influences choice of the breeding method.Backcross breeding can be used to transfer one or a few favorable genesfor a highly heritable trait into a desirable cultivar. This approachhas been used extensively for breeding disease-resistant cultivars.Various recurrent selection techniques are used to improvequantitatively inherited traits controlled by numerous genes. The use ofrecurrent selection in self-pollinating crops depends on the ease ofpollination, the frequency of successful hybrids from each pollination,and the number of hybrid offspring from each successful cross.

Breeding lines can be tested and compared to appropriate standards inenvironments representative of the commercial target area(s) for two ormore generations. The best lines are candidates for new commercialcultivars; those still deficient in traits may be used as parents toproduce new populations for further selection.

One method of identifying a superior plant is to observe its performancerelative to other experimental plants and to a widely grown standardcultivar. If a single observation is inconclusive, replicatedobservations can provide a better estimate of its genetic worth. Abreeder can select and cross two or more parental lines, followed byrepeated selfing and selection, producing many new genetic combinations.

The development of new cultivars requires the development and selectionof varieties, the crossing of these varieties and the selection ofsuperior hybrid crosses. The hybrid seed can be produced by manualcrosses between selected male-fertile parents or by using male sterilitysystems. Hybrids are selected for certain single gene traits such as podcolor, flower color, seed yield, pubescence color, or herbicideresistance, which indicate that the seed is truly a hybrid. Additionaldata on parental lines, as well as the phenotype of the hybrid,influence the breeder's decision whether to continue with the specifichybrid cross.

Pedigree breeding and recurrent selection breeding methods can be usedto develop cultivars from breeding populations. Breeding programscombine desirable traits from two or more cultivars or variousbroad-based sources into breeding pools from which cultivars aredeveloped by selfing and selection of desired phenotypes. New cultivarscan be evaluated to determine which have commercial potential.

Pedigree breeding is used commonly for the improvement ofself-pollinating crops. Two parents who possess favorable, complementarytraits are crossed to produce an F₁. A F₂ population is produced byselfing one or several F₁'s. Selection of the best individuals from thebest families is carried out. Replicated testing of families can beginin the F₄ generation to improve the effectiveness of selection fortraits with low heritability. At an advanced stage of inbreeding (i.e.,F₆ and F₇), the best lines or mixtures of phenotypically similar linesare tested for potential release as new cultivars.

Backcross breeding has been used to transfer genes for a simplyinherited, highly heritable trait into a desirable homozygous cultivaror inbred line, which is the recurrent parent. The source of the traitto be transferred is called the donor parent. The resulting plant isexpected to have the attributes of the recurrent parent (e.g., cultivar)and the desirable trait transferred from the donor parent. After theinitial cross, individuals possessing the phenotype of the donor parentare selected and repeatedly crossed (backcrossed) to the recurrentparent. The resulting parent is expected to have the attributes of therecurrent parent (e.g., cultivar) and the desirable trait transferredfrom the donor parent.

The single-seed descent procedure in the strict sense refers to plantinga segregating population, harvesting a sample of one seed per plant, andusing the one-seed sample to plant the next generation. When thepopulation has been advanced from the F₂ to the desired level ofinbreeding, the plants from which lines are derived will each trace todifferent F₂ individuals. The number of plants in a population declineseach generation due to failure of some seeds to germinate or some plantsto produce at least one seed. As a result, not all of the F₂ plantsoriginally sampled in the population will be represented by a progenywhen generation advance is completed.

In a multiple-seed procedure, breeders commonly harvest one or more podsfrom each plant in a population and thresh them together to form a bulk.Part of the bulk is used to plant the next generation and part is put inreserve. The procedure has been referred to as modified single-seeddescent or the pod-bulk technique.

The multiple-seed procedure has been used to save labor at harvest. Itis considerably faster to thresh pods with a machine than to remove oneseed from each by hand for the single-seed procedure. The multiple-seedprocedure also makes it possible to plant the same number of seeds of apopulation each generation of inbreeding.

Descriptions of other breeding methods that are commonly used fordifferent traits and crops can be found in one of several referencebooks (e.g., Fehr, Principles of Cultivar Development, Vol. 1, pp. 2-3(1987)).

A transgenic plant of the present invention may also be reproduced usingapomixis. Apomixis is a genetically controlled method of reproduction inplants where the embryo is formed without union of an egg and a sperm.There are three basic types of apomictic reproduction: 1) apospory wherethe embryo develops from a chromosomally unreduced egg in an embryo sacderived from the nucleus; 2) diplospory where the embryo develops froman unreduced egg in an embryo sac derived from the megaspore mothercell; and 3) adventitious embryony where the embryo develops directlyfrom a somatic cell. In most forms of apomixis, pseudogamy, orfertilization of the polar nuclei to produce endosperm is necessary forseed viability. In apospory, a nurse cultivar can be used as a pollensource for endosperm formation in seeds. The nurse cultivar does notaffect the genetics of the aposporous apomictic cultivar since theunreduced egg of the cultivar develops parthenogenetically, but makespossible endosperm production. Apomixis is economically important,especially in transgenic plants, because it causes any genotype, nomatter how heterozygous, to breed true. Thus, with apomicticreproduction, heterozygous transgenic plants can maintain their geneticfidelity throughout repeated life cycles. Methods for the production ofapomictic plants are known in the art. See, U.S. Pat. No. 5,811,636.

Other Organisms

A nucleic acid of the present invention may be introduced into any cellor organism such as a mammalian cell, mammal, fish cell, fish, birdcell, bird, algae cell, algae, fungal cell, fungi, or bacterial cell. Aprotein of the present invention may be produced in an appropriate cellor organism. Preferred host and transformants include: fungal cells suchas Aspergillus, yeasts, mammals, particularly bovine and porcine,insects, bacteria, and algae. Particularly preferred bacteria areAgrobacterium tumefaciens and E. coli.

Methods to transform such cells or organisms are known in the art (EP 0238 023; Yelton et al., Proc. Natl. Acad. Sci. (U.S.A.), 81:1470-1474(1984); Malardier et al., Gene, 78:147-156 (1989); Becker and Guarente,In: Abelson and Simon (eds.), Guide to Yeast Genetics and MolecularBiology, Method Enzymol., Vol. 194, pp. 182-187, Academic Press, Inc.,NY; Ito et al., J. Bacteriology, 153:163 (1983); Hinnen et al., Proc.Natl. Acad. Sci. (U.S.A.), 75:1920 (1978); Bennett and LaSure (eds.),More Gene Manipualtionins in fungi, Academic Press, CA (1991)). Methodsto produce proteins of the present invention are also known (Kudla etal., EMBO, 9:1355-1364 (1990); Jarai and Buxton, Current Genetics,26:2238-2244 (1994); Verdier, Yeast, 6:271-297 (1990); MacKenzie et al.,Journal of Gen. Microbiol., 139:2295-2307 (1993); Hartl et al., TIBS,19:20-25 (1994); Bergenron et al., TIBS, 19:124-128 (1994); Demolder etal., J. Biotechnology, 32:179-189 (1994); Craig, Science, 260:1902-1903(1993); Gething and Sambrook, Nature, 355:33-45 (1992); Puig andGilbert, J., Biol. Chem., 269:7764-7771 (1994); Wang and Tsou, FASEBJournal, 7:1515-1517 (1993); Robinson et al., Bio/Technology, 1:381-384(1994); Enderlin and Ogrydziak, Yeast, 10:67-79 (1994); Fuller et al.,Proc. Natl. Acad. Sci. (U.S.A.), 86:1434-1438 (1989); Julius et al.,Cell, 37:1075-1089 (1984); Julius et al., Cell, 32:839-852 (1983)).

In a preferred embodiment, overexpression of a protein or fragmentthereof of the present invention in a cell or organism provides in thatcell or organism, relative to an untransformed cell or organism with asimilar genetic background, an increased level of tocopherols.

In a preferred embodiment, overexpression of a protein or fragmentthereof of the present invention in a cell or organism provides in thatcell or organism, relative to an untransformed cell or organism with asimilar genetic background, an increased level of α-tocopherols.

In a preferred embodiment, overexpression of a protein or fragmentthereof of the present invention in a cell or organism provides in thatcell or organism, relative to an untransformed cell or organism with asimilar genetic background, an increased level of γ-tocopherols.

In another preferred embodiment, overexpression of a protein or fragmentthereof of the present invention in a cell or organism provides in thatcell or organism, relative to an untransformed cell or organism with asimilar genetic background, an increased level of α-tocotrienols.

In another preferred embodiment, overexpression of a protein or fragmentthereof of the present invention in a cell or organism provides in thatcell or organism, relative to an untransformed cell or organism with asimilar genetic background, an increased level of γ-tocotrienols.

Antibodies

One aspect of the present invention concerns antibodies, single-chainantigen binding molecules, or other proteins that specifically bind toone or more of the protein or peptide molecules of the present inventionand their homologs, fusions or fragments. In a particularly preferredembodiment, the antibody specifically binds to a protein having theamino acid sequence set forth in SEQ I) NOs: 5, 9-11, 43-44, 57-58, and90, or fragments thereof. Antibodies of the present invention may beused to quantitatively or qualitatively detect the protein or peptidemolecules of the present invention, or to detect post translationalmodifications of the proteins. As used herein, an antibody or peptide issaid to “specifically bind” to a protein or peptide molecule of thepresent invention if such binding is not competitively inhibited by thepresence of non-related molecules.

Nucleic acid molecules that encode all or part of the protein of thepresent invention can be expressed, via recombinant means, to yieldprotein or peptides that can in turn be used to elicit antibodies thatare capable of binding the expressed protein or peptide. Such antibodiesmay be used in immunoassays for that protein. Such protein-encodingmolecules, or their fragments may be a “fusion” molecule (i.e., a partof a larger nucleic acid molecule) such that, upon expression, a fusionprotein is produced. It is understood that any of the nucleic acidmolecules of the present invention may be expressed, via recombinantmeans, to yield proteins or peptides encoded by these nucleic acidmolecules.

The antibodies that specifically bind proteins and protein fragments ofthe present invention may be polyclonal or monoclonal and may compriseintact immunoglobulins, or antigen binding portions of immunoglobulinsfragments (such as (F(ab′), F(ab′)₂)), or single-chain immunoglobulinsproducible, for example, via recombinant means. It is understood thatpractitioners are familiar with the standard resource materials thatdescribe specific conditions and procedures for the construction,manipulation and isolation of antibodies (see, for example, Harlow andLane, In: Antibodies: A Laboratory Manual, Cold Spring Harbor Press,Cold Spring Harbor, N.Y. (1988)).

As discussed below, such antibody molecules or their fragments may beused for diagnostic purposes. Where the antibodies are intended fordiagnostic purposes, it may be desirable to derivatize them, for examplewith a ligand group (such as biotin) or a detectable marker group (suchas a fluorescent group, a radioisotope or an enzyme).

The ability to produce antibodies that bind the protein or peptidemolecules of the present invention permits the identification of mimeticcompounds derived from those molecules. These mimetic compounds maycontain a fragment of the protein or peptide or merely a structurallysimilar region and nonetheless exhibits an ability to specifically bindto antibodies directed against that compound.

Exemplary Uses

Nucleic acid molecules and fragments thereof of the present inventionmay be employed to obtain other nucleic acid molecules from the samespecies (nucleic acid molecules from corn may be utilized to obtainother nucleic acid molecules from corn). Such nucleic acid moleculesinclude the nucleic acid molecules that encode the complete codingsequence of a protein and promoters and flanking sequences of suchmolecules. In addition, such nucleic acid molecules include nucleic acidmolecules that encode for other isozymes or gene family members. Suchmolecules can be readily obtained by using the above-described nucleicacid molecules or fragments thereof to screen cDNA or genomic libraries.Methods for forming such libraries are well known in the art.

Nucleic acid molecules and fragments thereof of the present inventionmay also be employed to obtain nucleic acid homologs. Such homologsinclude the nucleic acid molecules of plants and other organisms,including bacteria and fungi, including the nucleic acid molecules thatencode, in whole or in part, protein homologues of other plant speciesor other organisms, sequences of genetic elements, such as promoters andtranscriptional regulatory elements. Such molecules can be readilyobtained by using the above-described nucleic acid molecules orfragments thereof to screen cDNA or genomic libraries obtained from suchplant species. Methods for forming such libraries are well known in theart. Such homolog molecules may differ in their nucleotide sequencesfrom those coding for one or more of SEQ ID NOs: 5, 9-11, 43-44, 57-58,and 90, and complements thereof because complete complementarity is notneeded for stable hybridization. The nucleic acid molecules of thepresent invention therefore also include molecules that, althoughcapable of specifically hybridizing with the nucleic acid molecules maylack “complete complementarity”.

Any of a variety of methods may be used to obtain one or more of theabove-described nucleic acid molecules (Zamechik et al., Proc. Natl.Acad. Sci. (U.S.A.), 83:4143-4146 (1986); Goodchild et al., Proc. Natl.Acad. Sci. (U.S.A.), 85:5507-5511 (1988); Wickstrom et al., Proc. Natl.Acad. Sci. (U.S.A.), 85:1028-1032 (1988); Holt et al., Molec. Cell.Biol., 8:963-973 (1988); Gerwirtz et al., Science, 242:1303-1306 (1988);Anfossi et al., Proc. Natl. Acad. Sci. (U.S.A.), 86:3379-3383 (1989);Becker et al., EMBO J., 8:3685-3691 (1989)). Automated nucleic acidsynthesizers may be employed for this purpose. In lieu of suchsynthesis, the disclosed nucleic acid molecules may be used to define apair of primers that can be used with the polymerase chain reaction(Mullis et al., Cold Spring Harbor Symp. Quant. Biol., 51:263-273(1986); Erlich et al., EP 50 424; EP 84 796; EP 258 017; EP 237 362;Mullis, EP 201 184; Mullis et al., U.S. Pat. No. 4,683,202; Erlich, U.S.Pat. No. 4,582,788; and Saiki et al., U.S. Pat. No. 4,683,194) toamplify and obtain any desired nucleic acid molecule or fragment.

Promoter sequences and other genetic elements, including but not limitedto transcriptional regulatory flanking sequences, associated with one ormore of the disclosed nucleic acid sequences can also be obtained usingthe disclosed nucleic acid sequence provided herein. In one embodiment,such sequences are obtained by incubating nucleic acid molecules of thepresent invention with members of genomic libraries and recoveringclones that hybridize to such nucleic acid molecules thereof. In asecond embodiment, methods of “chromosome walking”, or inverse PCR maybe used to obtain such sequences (Frohman et al., Proc. Natl. Acad. Sci.(U.S.A.), 85:8998-9002 (1988); Ohara et al., Proc. Natl. Acad. Sci.(U.S.A.), 86:5673-5677 (1989); Pang et al., Biotechniques, 22:1046-1048(1977); Huang et al., Methods Mol. Biol., 69:89-96 (1997); Huang et al.,Method Mol. Biol., 67:287-294 (1997); Benkel et al., Genet. Anal.,13:123-127 (1996); Hartl et al., Methods Mol. Biol., 58:293-301 (1996)).The term “chromosome walking” means a process of extending a genetic mapby successive hybridization steps.

The nucleic acid molecules of the present invention may be used toisolate promoters of cell enhanced, cell specific, tissue enhanced,tissue specific, developmentally or environmentally regulated expressionprofiles. Isolation and functional analysis of the 5′ flanking promotersequences of these genes from genomic libraries, for example, usinggenomic screening methods and PCR techniques would result in theisolation of useful promoters and transcriptional regulatory elements.These methods are known to those of skill in the art and have beendescribed (see, for example, Birren et al., Genome Analysis: AnalyzingDNA, 1, (1997), Cold Spring Harbor Laboratory Press, Cold Spring Harbor,N.Y.). Promoters obtained utilizing the nucleic acid molecules of thepresent invention could also be modified to affect their controlcharacteristics. Examples of such modifications would include but arenot limited to enhancer sequences. Such genetic elements could be usedto enhance gene expression of new and existing traits for cropimprovement.

Another subset of the nucleic acid molecules of the present inventionincludes nucleic acid molecules that are markers. The markers can beused in a number of conventional ways in the field of moleculargenetics. Such markers include nucleic acid molecules encoding SEQ IDNOs: 5, 9-11, 43-44, 57-58, and 90, and complements thereof, andfragments of either that can act as markers and other nucleic acidmolecules of the present invention that can act as markers.

Genetic markers of the present invention include “dominant” or“codominant” markers. “Codominant markers” reveal the presence of two ormore alleles (two per diploid individual) at a locus. “Dominant markers”reveal the presence of only a single allele per locus. The presence ofthe dominant marker phenotype (e.g., a band of DNA) is an indicationthat one allele is in either the homozygous or heterozygous condition.The absence of the dominant marker phenotype (e.g., absence of a DNAband) is merely evidence that “some other” undefined allele is present.In the case of populations where individuals are predominantlyhomozygous and loci are predominately dimorphic, dominant and codominantmarkers can be equally valuable. As populations become more heterozygousand multi-allelic, codominant markers often become more informative ofthe genotype than dominant markers. Marker molecules can be, forexample, capable of detecting polymorphisms such as single nucleotidepolymorphisms (SNPs).

The genomes of animals and plants naturally undergo spontaneous mutationin the course of their continuing evolution (Gusella, Ann. Rev.Biochem., 55:831-854 (1986)). A “polymorphism” is a variation ordifference in the sequence of the gene or its flanking regions thatarises in some of the members of a species. The variant sequence and the“original” sequence co-exist in the species′ population. In someinstances, such co-existence is in stable or quasi-stable equilibrium.

A polymorphism is thus said to be “allelic”, in that, due to theexistence of the polymorphism, some members of a population may have theoriginal sequence (i.e., the original “allele”) whereas other membersmay have the variant sequence (i.e., the variant “allele”). In thesimplest case, only one variant sequence may exist and the polymorphismis thus said to be di-allelic. In other cases, the species′ populationmay contain multiple alleles and the polymorphism is termed tri-allelic,etc. A single gene may have multiple different unrelated polymorphisms.For example, it may have a di-allelic polymorphism at one site and amulti-allelic polymorphism at another site.

The variation that defines the polymorphism may range from a singlenucleotide variation to the insertion or deletion of extended regionswithin a gene. In some cases, the DNA sequence variations are in regionsof the genome that are characterized by short tandem repeats (STRs) thatinclude tandem di- or tri-nucleotide repeated motifs of nucleotides.Polymorphisms characterized by such tandem repeats are referred to as“variable number tandem repeat” (“VNTR”) polymorphisms. VNTRs have beenused in identity analysis (Weber, U.S. Pat. No. 5,075,217; Armour etal., FEBS Lett., 307:113-115 (1992); Jones et al., Eur. J. Haematol.,39:144-147 (1987); Horn et al., PCT Application WO 91/14003; Jeffreys,EP 370 719; Jeffreys, U.S. Pat. No. 5,175,082; Jeffreys et al., Amer. J.Hum. Genet., 39:11-24 (1986); Jeffreys et al., Nature, 316:76-79 (1985);Gray et al., Proc. R. Acad. Soc. Lond., 243:241-253 (1991); Moore etal., Genomics, 10:654-660 (1991); Jeffreys et al., Anim. Genet., 18:1-15(1987); Hillel et al., Anim. Genet., 20:145-155 (1989); Hillel et al.,Genet., 124:783-789 (1990)).

The detection of polymorphic sites in a sample of DNA may be facilitatedthrough the use of nucleic acid amplification methods. Such methodsspecifically increase the concentration of polynucleotides that span thepolymorphic site, or include that site and sequences located eitherdistal or proximal to it. Such amplified molecules can be readilydetected by gel electrophoresis or other means.

In an alternative embodiment, such polymorphisms can be detected throughthe use of a marker nucleic acid molecule that is physically linked tosuch polymorphism(s). For this purpose, marker nucleic acid moleculescomprising a nucleotide sequence of a polynucleotide located within 1 mbof the polymorphism(s) and more preferably within 100 kb of thepolymorphism(s) and most preferably within 10 kb of the polymorphism(s)can be employed.

The identification of a polymorphism can be determined in a variety ofways. By correlating the presence or absence of it in a plant with thepresence or absence of a phenotype, it is possible to predict thephenotype of that plant. If a polymorphism creates or destroys arestriction endonuclease cleavage site, or if it results in the loss orinsertion of DNA (e.g., a VNTR polymorphism), it will alter the size orprofile of the DNA fragments that are generated by digestion with thatrestriction endonuclease. As such, organisms that possess a variantsequence can be distinguished from those having the original sequence byrestriction fragment analysis. Polymorphisms that can be identified inthis manner are termed “restriction fragment length polymorphisms”(RFLPs) (Glassberg, UK Patent Application 2135774; Skolnick et al.,Cytogen. Cell Genet., 32:58-67 (1982); Botstein et al., Ann. J. Hum.Genet., 32:314-331 (1980); Fischer et al., PCT Application WO 90/13668;Uhlen, PCT Application WO 90/11,369).

Polymorphisms can -also be identified by Single Strand ConformationPolymorphism (SSCP) analysis (Elles, Methods in Molecular Medicine:Molecular Diagnosis of Genetic Diseases, Humana Press (1996)); Orita etal., Genomics, 5:874-879 (1989)). A number of protocols have beendescribed for SSCP including, but not limited to, Lee et al., Anal.Biochem., 205:289-293 (1992); Suzuki et al., Anal. Biochem., 192:82-84(1991); Lo et al., Nucleic Acids Research, 20:1005-1009 (1992); Sarkaret al., Genomics, 13:441-443 (1992). It is understood that one or moreof the nucleic acids of the present invention, may be utilized asmarkers or probes to detect polymorphisms by SSCP analysis.

Polymorphisms may also be found using a DNA fingerprinting techniquecalled amplified fragment length polymorphism (AFLP), which is based onthe selective PCR amplification of restriction fragments from a totaldigest of genomic DNA to profile that DNA (Vos et al., Nucleic AcidsRes., 23:4407-4414 (1995)). This method allows for the specificco-amplification of high numbers of restriction fragments, which can bevisualized by PCR without knowledge of the nucleic acid sequence. It isunderstood that one or more of the nucleic acids of the presentinvention may be utilized as markers or probes to detect polymorphismsby AFLP analysis or for fingerprinting RNA.

Polymorphisms may also be found using random amplified polymorphic DNA(RAPD) (Williams et al., Nucl. Acids Res., 18:6531-6535 (1990)) andcleaveable amplified polymorphic sequences (CAPS) (Lyamichev et al.,Science, 260:778-783 (1993)). It is understood that one or more of thenucleic acid molecules of the present invention, may be utilized asmarkers or probes to detect polymorphisms by RAPD or CAPS analysis.

Single Nucleotide Polymorphisms (SNPs) generally occur at greaterfrequency than other polymorphic markers and are spaced with a greateruniformity throughout a genome than other reported forms ofpolymorphism. The greater frequency and uniformity of SNPs means thatthere is greater probability that such a polymorphism will be found nearor in a genetic locus of interest than would be the case for otherpolymorphisms. SNPs are located in protein-coding regions and noncodingregions of a genome. Some of these SNPs may result in defective orvariant protein expression (e.g., as a result of mutations or defectivesplicing). Analysis (genotyping) of characterized SNPs can require onlya plus/minus assay rather than a lengthy measurement, permitting easierautomation.

SNPs can be characterized using any of a variety of methods. Suchmethods include the direct or indirect sequencing of the site, the useof restriction enzymes (Botstein et al., Am. J. Hum. Genet., 32:314-331(1980); Konieczny and Ausubel, Plant J., 4:403-410 (1993)), enzymaticand chemical mismatch assays (Myers et al., Nature, 313:495-498 (1985)),allele-specific PCR (Newton et al., Nucl. Acids Res., 17:2503-2516(1989); Wu et al., Proc. Natl. Acad. Sci. (U.S.A.), 86:2757-2760(1989)), ligase chain reaction (Barany, Proc. Natl. Acad. Sci. (U.S.A.),88:189-193 (1991)), single-strand conformation polymorphism analysis(Labrune et al., Am. J. Hum. Genet., 48:1115-1120 (1991)), single baseprimer extension (Kuppuswamy et al., Proc. Natl. Acad. Sci. (U.S.A.),88:1143-1147 (1991), Goelet, U.S. Pat. No. 6,004,744; Goelet, U.S. Pat.No. 5,888,819), solid-phase ELISA-based oligonucleotide ligation assays(Nikiforov et al., Nucl. Acids Res., 22:4167-4175 (1994)), dideoxyfingerprinting (Sarkar et al., Genomics, 13:441-443 (1992)),oligonucleotide fluorescence-quenching assays (Livak et al., PCR MethodsAppl., 4:357-362 (1995a)), 5′-nuclease allele-specific hybridizationTaqMan™ assay (Livak et al., Nature Genet., 9:341-342 (1995)),template-directed dye-terminator incorporation (TDI) assay (Chen andKwok, Nucl. Acids Res., 25:347-353 (1997)), allele-specific molecularbeacon assay (Tyagi et al., Nature Biotech., 16:49-53 (1998)), PinPointassay (Haff and Smirnov, Genome Res., 7:378-388 (1997)), dCAPS analysis(Neff et al., Plant J., 14:387-392 (1998)), pyrosequencing (Ronaghi etal., Analytical Biochemistry, 267:65-71 (1999); Ronaghi et al., WO98/13523; Nyren et al., WO 98/28440; www.pyrosequencing.com), using massspectrometry, e.g. the Masscode™ system (Howbert et al., WO 99/05319;Howbert et al., WO 97/27331; www.rapigene.com; Becker et al., WO98/26095; Becker et al., WO 98/12355; Becker et al., WO 97/33000;Monforte et al., U.S. Pat. No. 5,965,363), invasive cleavage ofoligonucleotide probes (Lyamichev et al., Nature Biotechnology,17:292-296; www.twt.com), and using high density oligonucleotide arrays(Hacia et al., Nature Genetics, 22:164-167; www.affymetrix.com).

Polymorphisms may also be detected using allele-specificoligonucleotides (ASO), which, can be for example, used in combinationwith hybridization based technology including Southern, Northern, anddot blot hybridizations, reverse dot blot hybridizations andhybridizations performed on microarray and related technology.

The stringency of hybridization for polymorphism detection is highlydependent upon a variety of factors, including length of theallele-specific oligonucleotide, sequence composition, degree ofcomplementarity (i.e., presence or absence of base mismatches),concentration of salts and other factors such as formamide andtemperature. These factors are important both during the hybridizationitself and during subsequent washes performed to remove targetpolynucleotide that is not specifically hybridized. In practice, theconditions of the final, most stringent wash are most critical. Inaddition, the amount of target polynucleotide that is able to hybridizeto the allele-specific oligonucleotide is also governed by such factorsas the concentration of both the ASO and the target polynucleotide, thepresence and concentration of factors that act to “tie up” watermolecules, so as to effectively concentrate the reagents (e.g., PEG,dextran, dextran sulfate, etc.), whether the nucleic acids areimmobilized or in solution, and the duration of hybridization andwashing steps.

Hybridizations are preferably performed below the melting temperature(T_(m)) of the ASO. The closer the hybridization and/or washing step isto the T_(m), the higher the stringency. T_(m) for an oligonucleotidemay be approximated, for example, according to the following formula:T_(m)=81.5+16.6×(log 10[Na+])+0.41×(% G+C)−675/n; where [Na+] is themolar salt concentration of Na+ or any other suitable cation andn=number of bases in the oligonucleotide. Other formulas forapproximating T_(m) are available and are known to those of ordinaryskill in the art.

Stringency is preferably adjusted so as to allow a given ASO todifferentially hybridize to a target polynucleotide of the correctallele and a target polynucleotide of the incorrect allele. Preferably,there will be at least a two-fold differential between the signalproduced by the ASO hybridizing to a target polynucleotide of thecorrect allele and the level of the signal produced by the ASOcross-hybridizing to a target polynucleotide of the incorrect allele(e.g., an ASO specific for a mutant allele cross-hybridizing to awild-type allele). In more preferred embodiments of the presentinvention, there is at least a five-fold signal differential. In highlypreferred embodiments of the present invention, there is at least anorder of magnitude signal differential between the ASO hybridizing to atarget polynucleotide of the correct allele and the level of the signalproduced by the ASO cross-hybridizing to a target polynucleotide of theincorrect allele.

While certain methods for detecting polymorphisms are described herein,other detection methodologies may be utilized. For example, additionalmethodologies are known and set forth, in Birren et al., GenomeAnalysis, 4:135-186; A Laboratory Manual. Mapping Genomes, Cold SpringHarbor Laboratory Press, Cold Spring Harbor, N.Y. (1999); Maliga et al.,Methods in Plant Molecular Biology. A Laboratory Course Manual, ColdSpring Harbor Laboratory Press, Cold Spring Harbor, N.Y. (1995);Paterson, Biotechnology Intelligence Unit: Genome Mapping in Plants,R.G. Landes Co., Georgetown, Tex., and Academic Press, San Diego, Calif.(1996); The Corn Handbook, Freeling and Walbot, (eds.), Springer-Verlag,New York, N.Y. (1994); Methods in Molecular Medicine: MolecularDiagnosis of Genetic Diseases, Elles, (ed.), Humana Press, Totowa, N.J.(1996); Clark, (ed.), Plant Molecular Biology: A Laboratory Manual,Springer-Verlag, Berlin, Germany (1997).

Factors for marker-assisted selection in a plant breeding program are:(1) the marker(s) should co-segregate or be closely linked with thedesired trait; (2) an efficient means of screening large populations forthe molecular marker(s) should be available; and (3) the screeningtechnique should have high reproducibility across laboratories andpreferably be economical to use and be user-friendly.

The genetic linkage of marker molecules can be established by a genemapping model such as, without limitation, the flanking marker modelreported by Lander and Botstein, Genetics, 121:185-199 (1989) and theinterval mapping, based on maximum likelihood methods described byLander and Botstein, Genetics, 121:185-199 (1989) and implemented in thesoftware package MAPMAKER/QTL (Lincoln and Lander, Mapping GenesControlling Quantitative Traits Using MAPMAKER/QTL, Whitehead Institutefor Biomedical Research, MA (1990). Additional software includes Qgene,Version 2.23 (1996), Department of Plant Breeding and Biometry, 266Emerson Hall, Cornell University, Ithaca, N.Y. Use of Qgene software isa particularly preferred approach.

A maximum likelihood estimate (MLE) for the presence of a marker iscalculated, together with an MLE assuming no QTL effect, to avoid falsepositives. A log₁₀ of an odds ratio (LOD) is then calculated as:LOD=log₁₀ (MLE for the presence of a QTL/MLE given no linked QTL).

The LOD score essentially indicates how much more likely the data are tohave arisen assuming the presence of a QTL than in its absence. The LODthreshold value for avoiding a false positive with a given confidence,say 95%, depends on the number of markers and the length of the genome.Graphs indicating LOD thresholds are set forth in Lander and Botstein,Genetics, 121:185-199 (1989) and further described by Arús andMoreno-González, Plant Breeding, Hayward et al., (eds.) Chapman & Hall,London, pp. 314-331 (1993).

In a preferred embodiment of the present invention the nucleic acidmarker exhibits a LOD score of greater than 2.0, more preferably 2.5,even more preferably greater than 3.0 or 4.0 with the trait or phenotypeof interest. In a preferred embodiment, the trait of interest is alteredtocopherol levels or compositions or altered tocotrienol levels orcompositions.

Additional models can be used. Many modifications and alternativeapproaches to interval mapping have been reported, including the use ofnon-parametric methods (Kruglyak and Lander, Genetics, 139:1421-1428(1995)). Multiple regression methods or models can also be used, inwhich the trait is regressed on a large number of markers (Jansen,Biometrics in Plant Breeding, van Oijen and Jansen (eds.), Proceedingsof the Ninth Meeting of the Eucarpia Section Biometrics in PlantBreeding, The Netherlands, pp. 116-124 (1994); Weber and Wricke,Advances in Plant Breeding, Blackwell, Berlin, 16 (1994)). Procedurescombining interval mapping with regression analysis, whereby thephenotype is regressed onto a single putative QTL at a given markerinterval and at the same time onto a number of markers that serve as“cofactors”, have been reported by Jansen and Stam, Genetics,136:1447-1455 (1994); and Zeng, Genetics, 136:1457-1468 (1994).Generally, the use of cofactors reduces the bias and sampling error ofthe estimated QTL positions (Utz and Melchinger, Biometrics in PlantBreeding, van Oijen and Jansen (eds.), Proceedings of the Ninth Meetingof the Eucarpia Section Biometrics in Plant Breeding, The Netherlands,pp.195-204 (1994), thereby improving the precision and efficiency of QTLmapping (Zeng, Genetics, 136:1457-1468 (1994)). These models can beextended to multi-environment experiments to analyzegenotype-environment interactions (Jansen et al., Theo. Appl. Genet.,91:33-37 (1995)).

It is understood that one or more of the nucleic acid molecules of thepresent invention may be used as molecular markers. It is alsounderstood that one or more of the protein molecules of the presentinvention may be used as molecular markers.

In a preferred embodiment, the polymorphism is present and screened forin a mapping population, e.g. a collection of plants capable of beingused with markers such as polymorphic markers to map genetic position oftraits. The choice of appropriate mapping population often depends onthe type of marker systems employed (Tanksley et al., J. P. Gustafsonand R. Appels (eds.). Plenum Press, NY, pp. 157-173 (1988)).Consideration must be given to the source of parents (adapted vs.exotic) used in the mapping population. Chromosome pairing andrecombination rates can be severely disturbed (suppressed) in widecrosses (adapted x exotic) and generally yield greatly reduced linkagedistances. Wide crosses will usually provide segregating populationswith a relatively large number of polymorphisms when compared to progenyin a narrow cross (adapted x adapted).

An F₂ population is the first generation of selfing (self-pollinating)after the hybrid seed is produced. Usually a single F₁ plant is selfedto generate a population segregating for all the genes in Mendelian(1:2:1) pattern. Maximum genetic information is obtained from acompletely classified F₂ population using a codominant marker system(rather, Measurement of Linkage in Heredity: Methuen and Co., (1938)).In the case of dominant markers, progeny tests (e.g., F₃, BCF₂) arerequired to identify the heterozygotes, in order to classify thepopulation. However, this procedure is often prohibitive because of thecost and time involved in progeny testing. Progeny testing of F₂individuals is often used in map construction where phenotypes do notconsistently reflect genotype (e.g., disease resistance) or where traitexpression is controlled by a QTL. Segregation data from progeny testpopulations (e.g., F₃ or BCF₂) can be used in map construction.Marker-assisted selection can then be applied to cross progeny based onmarker-trait map associations (F₂, F₃), where linkage groups have notbeen completely disassociated by recombination events (i.e., maximumdisequilibrium).

Recombinant inbred lines (RIL) (genetically related lines; usually >F₅,developed from continuously selfing F₂ lines towards homozygosity) canbe used as a mapping population. Information obtained from dominantmarkers can be maximized by using RIL because all loci are homozygous ornearly so. Under conditions of tight linkage (i.e., about <10%recombination), dominant and co-dominant markers evaluated in RILpopulations provide more information per individual than either markertype in backcross populations (Reiter et al., Proc. Natl. Acad. Sci.(U.S.A.), 89:1477-1481 (1992)). However, as the distance between markersbecomes larger (i.e., loci become more independent), the information inRIL populations decreases dramatically when compared to codominantmarkers.

Backcross populations e.g., generated from a cross between a successfulvariety (recurrent parent) and another variety (donor parent) carrying atrait not present in the former) can be utilized as a mappingpopulation. A series of backcrosses to the recurrent parent can be madeto recover most of its desirable traits. Thus a population is createdconsisting of individuals nearly like the recurrent parent but eachindividual carries varying amounts or mosaic of genomic regions from thedonor parent. Backcross populations can be useful for mapping dominantmarkers if all loci in the recurrent parent are homozygous and the donorand recurrent parent have contrasting polymorphic marker alleles (Reiteret al., Proc. Natl. Acad. Sci. (U.S.A.), 89:1477-1481 (1992)).Information obtained from backcross populations using either codominantor dominant markers is less than that obtained from F₂ populationsbecause one, rather than two, recombinant gamete is sampled per plant.Backcross populations, however, are more informative (at low markersaturation) when compared to RILs as the distance between linked lociincreases in RIL populations (i.e., about 0.15% recombination).Increased recombination can be beneficial for resolution of tightlinkages, but may be undesirable in the construction of maps with lowmarker saturation.

Near-isogenic lines (NIL) (created by many backcrosses to produce acollection of individuals that is nearly identical in geneticcomposition except for the trait or genomic region under interrogation)can be used as a mapping population. In mapping with NILs, only aportion of the polymorphic loci is expected to map to a selected region.

Bulk segregant analysis (BSA) is a method developed for the rapididentification of linkage between markers and traits of interest(Michelmore et al., Proc. Natl. Acad. Sci. (U.S.A.), 88:9828-9832(1991)). In BSA, two bulked DNA samples are drawn from a segregatingpopulation originating from a single cross. These bulks containindividuals that are identical for a particular trait (resistant orsusceptible to particular disease) or genomic region but arbitrary atunlinked regions (i.e., heterozygous). Regions unlinked to the targetregion will not differ between the bulked samples of many individuals inBSA.

In an aspect of the present invention, one or more of the nucleicmolecules of the present invention are used to determine the level(i.e., the concentration of mRNA in a sample, etc.) in a plant(preferably canola, corn, Brassica campestris, oilseed rape, rapeseed,soybean, crambe, mustard, castor bean, peanut, sesame, cottonseed,linseed, safflower, oil palm, flax or sunflower) or pattern (i.e., thekinetics of expression, rate of decomposition, stability profile, etc.)of the expression of a protein encoded in part or whole by one or moreof the nucleic acid molecule of the present invention (collectively, the“Expression Response” of a cell or tissue).

As used herein, the Expression Response manifested by a cell or tissueis said to be “altered” if it differs from the Expression Response ofcells or tissues of plants not exhibiting the phenotype. To determinewhether a Expression Response is altered, the Expression Responsemanifested by the cell or tissue of the plant exhibiting the phenotypeis compared with that of a similar cell or tissue sample of a plant notexhibiting the phenotype. As will be appreciated, it is not necessary tore-determine the Expression Response of the cell or tissue sample ofplants not exhibiting the phenotype each time such a comparison is made;rather, the Expression Response of a particular plant may be comparedwith previously obtained values of normal plants. As used herein, thephenotype of the organism is any of one or more characteristics of anorganism (e.g., disease resistance, pest tolerance, environmentaltolerance such as tolerance to abiotic stress, male sterility, qualityimprovement or yield etc.). A change in genotype or phenotype may betransient or permanent. Also as used herein, a tissue sample is anysample that comprises more than one cell. In a preferred aspect, atissue sample comprises cells that share a common characteristic (e.g.,Derived from root, seed, flower, leaf, stem or pollen etc.).

In one aspect of the present invention, an evaluation can be conductedto determine whether a particular mRNA molecule is present. One or moreof the nucleic acid molecules of the present invention are utilized todetect the presence or quantity of the mRNA species. Such molecules arethen incubated with cell or tissue extracts of a plant under conditionssufficient to permit nucleic acid hybridization. The detection ofdouble-stranded probe-mRNA hybrid molecules is indicative of thepresence of the mRNA; the amount of such hybrid formed is proportionalto the amount of mRNA. Thus, such probes may be used to ascertain thelevel and extent of the mRNA production in a plant's cells or tissues.Such nucleic acid hybridization may be conducted under quantitativeconditions (thereby providing a numerical value of the amount of themRNA present). Alternatively, the assay may be conducted as aqualitative assay that indicates either that the mRNA is present, orthat its level exceeds a user set, predefined value.

A number of methods can be used to compare the expression responsebetween two or more samples of cells or tissue. These methods includehybridization assays, such as northerns, RNAse protection assays, and insitu hybridization. Alternatively, the methods include PCR-type assays.In a preferred method, the expression response is compared byhybridizing nucleic acids from the two or more samples to an array ofnucleic acids. The array contains a plurality of suspected sequencesknown or suspected of being present in the cells or tissue of thesamples.

An advantage of in situ hybridization over more conventional techniquesfor the detection of nucleic acids is that it allows an investigator todetermine the precise spatial population (Angerer et al., Dev. Biol.,101:477-484 (1984); Angerer et al., Dev. Biol., 112:157-166 (1985);Dixon et al., EMBO J., 10:1317-1324 (1991)). In situ hybridization maybe used to measure the steady-state level of RNA accumulation (Hardin etal., J. Mol. Biol., 202:417-431 (1989)). A number of protocols have beendevised for in situ hybridization, each with tissue preparation,hybridization and washing conditions (Meyerowitz, Plant Mol. Biol. Rep.,5:242-250 (1987); Cox and Goldberg, In: Plant Molecular Biology: APractical Approach, Shaw (ed.), pp. 1-35, IRL Press, Oxford (1988);Raikhel et al., In situ RNA hybridization in plant tissues, In: PlantMolecular Biology Manual, Vol. B9:1-32, Kluwer Academic Publisher,Dordrecht, Belgium (1989)).

In situ hybridization also allows for the localization of proteinswithin a tissue or cell (Wilkinson, In Situ Hybridization, OxfordUniversity Press, Oxford (1992); Langdale, In Situ Hybridization In: TheCorn Handbook, Freeling and Walbot (eds.), pp. 165-179, Springer-Verlag,NY (1994)). It is understood that one or more of the molecules of thepresent invention, preferably one or more of the nucleic acid moleculesor fragments thereof of the present invention or one or more of theantibodies of the present invention may be utilized to detect the levelor pattern of a protein or mRNA thereof by in situ hybridization.

Fluorescent in situ hybridization allows the localization of aparticular DNA sequence along a chromosome, which is useful, among otheruses, for gene mapping, following chromosomes in hybrid lines, ordetecting chromosomes with translocations, transversions or deletions.In situ hybridization has been used to identify chromosomes in severalplant species (Griffor et al., Plant Mol. Biol., 17:101-109 (1991);Gustafson et al., Proc. Natl. Acad. Sci. (U.S.A.), 87:1899-1902 (1990);Mukai and Gill, Genome, 34:448-452 (1991); Schwarzacher andHeslop-Harrison, Genome, 34:317-323 (1991); Wang et al., Jpn. J. Genet.,66:313-316 (1991); Parra and Windle, Nature Genetics, 5:17-21 (1993)).It is understood that the nucleic acid molecules of the presentinvention may be used as probes or markers to localize sequences along achromosome.

Another method to localize the expression of a molecule is tissueprinting. Tissue printing provides a way to screen, at the same time onthe same membrane many tissue sections from different plants ordifferent developmental stages (Yomo and Taylor, Planta, 112:35-43(1973); Harris and Chrispeels, Plant Physiol., 56:292-299 (1975); Cassaband Varner, J. Cell. Biol., 105:2581-2588 (1987); Spruce et al.,Phytochemistry, 26:2901-2903 (1987); Barres et al., Neuron, 5:527-544(1990); Reid and Pont-Lezica, Tissue Printing: Tools for the Study ofAnatomy, Histochemistry and Gene Expression, Academic Press, New York,N.Y. (1992); Reid et al., Plant Physiol., 93:160-165 (1990); Ye et al.,Plant J., 1:175-183 (1991)).

One skilled in the art can refer to general reference texts for detaileddescriptions of known techniques discussed herein or equivalenttechniques. These texts include Current Protocols in Molecular Biology,Ausubel et al., (eds.), John Wiley & Sons, NY (1989), and supplementsthrough September (1998), Molecular Cloning, A Laboratory Manual,Sambrook et al., 2^(nd) Ed., Cold Spring Harbor Press, Cold SpringHarbor, N.Y. (1989), Genome Analysis: A Laboratory Manual 1: AnalyzingDNA, Birren et al., Cold Spring Harbor Press, Cold Spring Harbor, N.Y.(1997); Genome Analysis: A Laboratory Manual 2: Detecting Genes, Birrenet al., Cold Spring Harbor Press, Cold Spring Harbor, N.Y. (1998);Genome Analysis: A Laboratory Manual 3: Cloning Systems, Birren et al.,Cold Spring Harbor Press, Cold Spring Harbor, N.Y. (1999); GenomeAnalysis: A Laboratory Manual 4: Mapping Genomes, Birren et al., ColdSpring Harbor Press, Cold Spring Harbor, N.Y. (1999); Plant MolecularBiology: A Laboratory Manual, Clark, Springer-Verlag, Berlin, (1997);Methods in Plant Molecular Biology, Maliga et al., Cold Spring HarborPress, Cold Spring Harbor, N.Y. (1995). These texts can, of course, alsobe referred to in making or using an aspect of the present invention. Itis understood that any of the agents of the present invention can besubstantially purified and/or be biologically active and/or recombinant.

Having now generally described the present invention, the same will bemore readily understood through reference to the following examples thatare provided by way of illustration, and are not intended to be limitingof the present invention, unless specified.

EXAMPLE 1 Identification of Homogentisate Prenyl Transferase Sequences

This example sets forth methods used to analyze homogentisate prenyltransferase sequences from various sources in order to identify motifscommon to homogentisate prenyl transferase that are contained therein.

Homogentisate prenyl transferase sequences from Soy, Arabidopsis, Cornand Cuphea (partial) are cloned and sequenced from EST sequences foundin an EST database. Synechocystis, Nostoc, and Anabaena are obtainedfrom Genbank. These sequences (representing SEQ ID NOs: 1-8) are thenaligned with respect to each other using the multiple alignment softwareClustalX, which is described by Thompson et al., Nucleic Acids Research,24:4876-4882 (1997). The multiple alignment of the protein sequences isvisualized and edited using Genedoc, which is described by Nicholas etal., EMBNEW.NEWS, 4:14 (1997).

Using the aforementioned multiple alignment tool, four motifs (A-D) areidentified, as shown in FIGS. 2 a-2 c, wherein motifs A-D are set forth.These motifs are represented by SEQ ID NOs: 12-15. The Cuphea sequenceis removed from motif D because the sequence had multiple errors towardsthe 3′ end that generated apparent frame shift errors.

The specificity of these motifs is demonstrated using a Hidden MarkovModel (HMM) that is built using an HMMER(version 2.2g) software package(Eddy, Bioinformatics, 14:755-763 (1998)). A HMM search is performed ona cDNA sequence database containing full insert sequence from differentplant species. This search identifies two new homogentisate prenyltransferase sequences (SEQ ID NOs: 9-10) in addition to several partialhomogentisate prenyl transferase sequences. The two new homogentisateprenyl transferase sequences identified are from leek and wheat. Thissearch also identifies a complete Cuphea sequence (SEQ ID NO: 11) withno errors. A second alignment is generated using the aforementionedmultiple alignment tool, as shown in FIGS. 3 a-3 c. This alignment hasthe leek, wheat, and full Cuphea sequences incorporated. Motifs I-IV(SEQ ID NOs: 39-42) are shown.

Specificity is also tested by using each motif sequence to search thenon-redundant amino acid database downloaded from Genbank availablethrough NCBI. All four motifs identify three homogentisate prenyltransferase found in the aforementioned non-redundant amino aciddatabase, as follows: Nostoc, Synechocystis, Arabidopsis. Motifs II andIV also identified some genomic variants of an uncharacterizedArabidopsis protein. Motifs I and III only identified knownhomogentisate prenyl transferase at an E value of 0.001 or lower.

EXAMPLE 2 Preparation of Expression Constructs

A plasmid containing the napin cassette derived from pCGN3223 (describedin U.S. Pat. No. 5,639,790, the entirety of which is incorporated hereinby reference) is modified to make it more useful for cloning large DNAfragments containing multiple restriction sites, and to allow thecloning of multiple napin fusion genes into plant binary transformationvectors. An adapter comprised of the self annealed oligonucleotide ofsequence CGCGATTTAAATGGCGCGCCCTGCAGGCGGCCGCCTGCAGGGCGCGCCATTTAAAT (SEQID NO: 16) is ligated into the cloning vector pBC SK+ (Stratagene) afterdigestion with the restriction endonuclease BssHII to construct vectorpCGN7765. Plasmids pCGN3223 and pCGN7765 are digested with NotI andligated together. The resultant vector, pCGN7770, contains the pCGN7765backbone with the napin seed specific expression cassette from pCGN3223.

The cloning cassette pCGN7787 comprises essentially the same regulatoryelements as pCGN7770, with the exception that the napin regulatoryregions of pCGN7770 have been replaced with the double CaMV 35S promoterand the polyadenylation and transcriptional termination region.

A binary vector for plant transformation, pCGN5139, is constructed frompCGN1558 (McBride and Summerfelt, Plant Molecular Biology, 14:269-276(1990)). The polylinker of pCGN1558 is replaced as a HindIII/Asp718fragment with a polylinker containing unique restriction endonucleasesites, AscI, PacI, XbaI, SwaI, BamHI, and NotI. The Asp718 and HindIIIrestriction endonuclease sites are retained in pCGN5139.

A series of binary vectors are constructed to allow for the rapidcloning of DNA sequences into binary vectors containing transcriptionalinitiation regions (promoters) and transcriptional termination regions.

The plasmid pCGN8618 is constructed by ligating oligonucleotides5′-TCGAGGATCCGCGGCCGCAAGCTTCCTGCAGG-3′ (SEQ ID NO: 17) and5′-TCGACCTGCAGGAAGCTTGCGGCCGCGGATCC-3′ (SEQ ID NO: 18) intoSalII/XhoI-digested pCGN7770. A fragment containing the napin promoter,polylinker and napin 3′ region is excised from pCGN8618 by digestionwith Asp718I; the fragment is blunt-ended by filling in the 5′ overhangswith Klenow fragment then ligated into pCGN5139 that is digested withAsp7181 and HindIII and blunt-ended by filling in the 5′ overhangs withKlenow fragment. A plasmid containing the insert oriented so that thenapin promoter is closest to the blunted Asp718I site of pCGN5139 andthe napin 3′ is closest to the blunted HindIII site is subjected tosequence analysis to confirm both the insert orientation and theintegrity of cloning junctions. The resulting plasmid is designatedpCGN8622.

The plasmid pCGN8619 is constructed by ligating oligonucleotides5′-TCGACCTGCAGGAAGCTTGCGGCCGCGGATCC-3′ (SEQ ID NO: 19) and5′-TCGAGGATCCGCGGCCGCAAGCTTCCTGCAGG-3′ (SEQ ID NO: 20) intoSalI/XhoI-digested pCGN7770. A fragment containing the napin promoter,polylinker and napin 3′ region is removed from pCGN8619 by digestionwith Asp718I; the fragment is blunt-ended by filling in the 5′ overhangswith Klenow fragment then ligated into pCGN5139 that is digested withAsp7181 and HindIII and blunt-ended by filling in the 5′ overhangs withKlenow fragment. A plasmid containing the insert oriented so that thenapin promoter is closest to the blunted Asp7181 site of pCGN5139 andthe napin 3′ is closest to the blunted HindIII site is subjected tosequence analysis to confirm both the insert orientation and theintegrity of cloning junctions. The resulting plasmid is designatedpCGN8623.

The plasmid pCGNS620 is constructed by ligating oligonucleotides5′-TCGAGGATCCGCGGCCGCAAGCTTCCTGCAGGAGCT-3′ (SEQ ID NO: 21) and5′-CCTGCAGGAAGCTTGCGGCCGCGGATCC-3′ (SEQ ID NO: 22) intoSalI/SacI-digested pCGN7787. A fragment containing the d35S promoter,polylinker and trial 3′ region is removed from pCGN8620 by completedigestion with Asp718I and partial digestion with NotI. The fragment isblunt-ended by filling in the 5′ overhangs with Klenow fragment thenligated into pCGN5139 that is digested with Asp718I and HindIII andblunt-ended by filling in the 5′ overhangs with Klenow fragment. Aplasmid containing the insert oriented so that the d35S promoter isclosest to the blunted Asp718I site of pCGN5139 and the tml 3′ isclosest to the blunted HindIII site is subjected to sequence analysis toconfirm both the insert orientation and the integrity of cloningjunctions. The resulting plasmid is designated pCGN8624.

The plasmid pCGN8621 is constructed by ligating oligonucleotides5′-TCGACCTGCAGGAAGCTTGCGGCCGCGGATCCAGCT-3′ (SEQ ID NO: 23) and5′-GGATCCGCGGCCGCAAGCTTCCTGCAGG-3′ (SEQ ID NO: 24) intoSalI/SacI-digested pCGN7787. A fragment containing the d35S promoter,polylinker and tml 3′ region is removed from pCGN8621 by completedigestion with Asp718I and partial digestion with NotI.

The fragment is blunt-ended by filling in the 5′ overhangs with Klenowfragment then ligated into pCGN5139 that had been digested with Asp718Iand HindIII and blunt-ended by filling in the 5′ overhangs with Klenowfragment A plasmid containing the insert oriented so that the d35Spromoter is closest to the blunted Asp718I site of pCGN5139 and the tml3′ is closest to the blunted HindIII site is subjected to sequenceanalysis to confirm both the insert orientation and the integrity ofcloning junctions. The resulting plasmid is designated pCGN8625.

The plasmid construct pCGN8640 is a modification of pCGN8624 describedabove. A 938bp PstI fragment isolated from transposon Tn7 which encodesbacterial spectinomycin and streptomycin resistance (Fling et al.,Nucleic Acids Research, 13(19):7095-7106 (1985)), a determinant for E.coli and Agrobacterium selection, is blunt-ended with Pfu polymerase.The blunt-ended fragment is ligated into pCGN8624 that had been digestedwith SpeI and blunt-ended with Pfu polymerase. The region containing thePstI fragment is sequenced to confirm both the insert orientation andthe integrity of cloning junctions.

The spectinomycin resistance marker is introduced into pCGN8622 andpCGN8623 as follows. A 7.7 Kbp AvrII-SnaBI fragment from pCGN8640 isligated to a 10.9 Kbp AvrII-SnaBI fragment from pCGN8623 or pCGN8622,described above. The resulting plasmids are pCGN8641 and pCGN8643,respectively.

The plasmid pCGN8644 is constructed by ligating oligonucleotides5′-GATCACCTGCAGGAAGCTTGCGGCCGCGGATCCAATGCA-3′ (SEQ ID NO: 25) and 5′TTGGATCCGCGGCCGCAAGCTTCCTGCAGGT-3′ (SEQ ID NO: 26) into BamHI-PstIdigested pCGN8640.

Synthetic oligonucleotides are designed for use in Polymerase ChainReactions (PCR) to amplify the coding sequences of each of the nucleicacids that encode the polypeptides of SEQ ID NOs: 1-7, 9-11, 43-44,57-58, and 90 for the preparation of expression constructs.

The coding sequences of each of the nucleic acids that encode thepolypeptides of SEQ ID NOs: 1-7, 9-11, 43-44, 57-58, and 90 are allamplified and cloned into the TopoTA vector (Invitrogen). Constructscontaining the respective homogentisate prenyl transferase sequences aredigested with NotI and Sse8387I and cloned into the turbobinary vectorsdescribed above.

Synthetic oligonucleotides were designed for use in Polymerase ChainReactions (PCR) to amplify SEQ ID NO: 33 for the preparation ofexpression constructs and are provided in the table below: RestrictionSite Sequence SEQ ID NO: 5′ NotI GGATCCGCGGCCGCACAATGG 37AGTCTCTGCTCTCTAGTTCT 3′ SseI GGATCCTGCAGGTCACTTCAAA 38 AAAGGTAACAGCAAGT

SEQ ID NO: 33 was amplified using the respective PCR primers shown inthe table above and cloned into the TopoTA vector (Invitrogen).Constructs containing the respective homogentisate prenyl transferasesequences were digested with NotI and Sse8387I and cloned into theturbobinary vectors described above.

SEQ ID NO: 33 was cloned in the sense orientation into pCGN8640 toproduce the plant transformation construct pCGN10800 (FIG. 4). SEQ IDNO: 33 is under control of the enhanced 35S promoter.

SEQ ID NO: 33 was also cloned in the antisense orientation into theconstruct pCGN8641 to create pCGN10801 (FIG. 5). This construct providesfor the antisense expression of SEQ ID NO: 33 from the napin promoter.

SEQ ID NO: 33 was also cloned in the sense orientation into the vectorpCGN8643 to create the plant transformation construct pCGN10822 (FIG.7). This construct provides for the sense expression of SEQ ID NO: 33from the napin promoter.

SEQ ID NO: 33 was also cloned in the antisense orientation into thevector pCGN8644 to create the plant transformation construct pCGN10803(FIG. 6). This construct provides for the antisense expression of SEQ IDNO: 33 from the enhanced 35S promoter.

EXAMPLE 3 Plant Transformation

Transgenic Brassica plants are obtained by Agrobacterium-mediatedtransformation as described by Radke et al., Theor. Appl. Genet.,75:685-694 (1988); Plant Cell Reports, 11:499-505 (1992). TransgenicArabidopsis thaliana plants may be obtained by Agrobacterium-mediatedtransformation as described by Valverkens et al., Proc. Nat. Acad. Sci.,85:5536-5540 (1988), or as described by Bent et al., Science,265:1856-1860 (1994), or Bechtold et al., C.R. Acad. Sci. Life Sciences,316:1194-1199 (1993). Other plant species may be similarly transformedusing related techniques.

Alternatively, microprojectile bombardment methods, such as described byKlein et al., Bio/Technology, 10:286-291 may also be used to obtainnuclear transformed plants.

EXAMPLE 4 Identification of Additional Homogentisate Prenyl Transferase

In order to identify additional homogentisate prenyl transferase, motifsidentified through sequence homology are used to search a database ofcDNA sequences containing full insert sequences. The cDNA database isfirst translated in all six frames and then a HMM search is done using aHMM model built for the motifs. All HMM hits are annotated by performinga blast search against a non-redundant amino acid database. All motifsare sensitive and identify homogentisate prenyl transferase sequencespresent in the database. Novel homogentisate prenyl transferasesequences are thereby discovered.

EXAMPLE 5 Transgenic Plant Analysis

Arabidopsis plants transformed with constructs for the sense orantisense expression of the homogentisate prenyl transferase proteinsare analyzed by High Performance Liquid Chromatography (HPLC) foraltered levels of total tocopherols and tocotrienols, as well as alteredlevels of specific tocopherols and tocotrienols (e.g. α, β, γ, andδ-tocopherol/tocotrienol).

Extracts of leaves and seeds are prepared for HPLC as follows. For seedextracts, 10 mg of seed is added to 1 g of microbeads (Biospec) in asterile microfuge tube to which 500 ul 1% pyrogallol (SigmaChem)/ethanol is added. The mixture is shaken for 3 minutes in a miniBeadbeater (Biospec) on “fast” speed. The extract is filtered through a0.2 um filter into an autosampler tube. The filtered extracts are thenused in HPLC analysis described below.

Leaf extracts are prepared by mixing 30-50 mg of leaf tissue with 1 gmicrobeads and freezing in liquid nitrogen until extraction. Forextraction, 500 ul 1% pyrogallol in ethanol is added to the leaf/beadmixture and shaken for 1 minute on a Beadbeater (Biospec) on “fast”speed. The resulting mixture is centrifuged for 4 minutes at 14,000 rpmand filtered as described above prior to HPLC analysis.

HPLC is performed on a Zorbax silica HPLC column (4.6 mm×250 mm), usinga fluorescent detection monitor, with excitation and emission spectraset at 290 nm and 336 nm, respectively. Solvent A is hexane and solventB is methyl-t-butyl ether. The injection volume is 20 ul, the flow rateis 1.5 ml/min, the run time is 12 min (40° C.) using the table below:Time Solvent A Solvent B  0 min. 90% 10% 10 min. 90% 10% 11 min. 25% 75%12 min. 90% 10%

Tocopherol standards in 1% pyrogallol/ethanol are also run forcomparison (alpha tocopherol, gamma tocopherol, beta tocopherol, deltatocopherol, and tocopherol (tocol) (all from Matreya, State College,Pa., or Calbiochem, La Jolla, Calif.)).

Standard curves for alpha, beta, delta, and gamma tocopherol arecalculated using Chemstation software. The absolute amount of componentx is: Absolute amount of x=Response_(x)×RF_(x)×dilution factor whereResponse_(x) is the area of peak x, RF_(x), is the response factor forcomponent x (Amount_(x)/Response_(x)) and the dilution factor is 500 ul.The ng/mg tissue is found by: total ng component/mg plant tissue.

Results of the HPLC analysis of seed extracts of transgenic Arabidopsislines containing pMON10822 for the expression of SEQ ID NO: 33 from thenapin promoter are provided in FIG. 8.

HPLC analysis results of Arabidopsis seed tissue expressing the SEQ IDNO: 33 sequence from the napin promoter (pMON10822) demonstrates anincreased level of tocopherols in the seed. Total tocopherol levels areincreased as much as 50 to 60% over the total tocopherol levels ofnon-transformed (wild-type) Arabidopsis plants (FIG. 8).

Results of the HPLC analysis of seed extracts of transgenic Arabidopsislines 1387-1624 containing pMON10803 for the antisense expression of SEQID NO: 33 from the enhanced 35S promoter are provided in FIG. 9. Twolines, 1393 and 1401, show a substantial reduction in overall tocopherollevels, supporting the position that HPT is a homogentisate prenyltransferase involved in the synthesis of tocopherol.

Results of the HPLC analysis of seed extracts of transgenic Arabidopsislines containing constructs for the expression of SEQ ID NOs: 5, 9-11,43-44, 57-58, and 90 are obtained.

Results of the HPLC analysis of seed extracts of transgenic Arabidopsislines containing constructs for the expression of SEQ ID NOs: 5, 9-11,43-44, 57-58, and 90 from the enhanced 35S promoter are obtained.

EXAMPLE 6 Expression of a Homogentisate Prenyl Transferase as SingleGene, and in Combination with HPPD and tyrA in Soy

The Arabidopsis homogentisate prenyl transferase (ATPT2) (SEQ ID NO: 33)was cloned in a soy binary vector harboring an Arcelin 5 expressioncassette. This expression cassette consisted of an Arcelin 5-promoter, amulti cloning site, and the Arcelin 5 3′-untranslated sequence in theorder as described. Vector construction for this construct and thefollowing constructs was performed using standard cloning techniqueswell established in the art and described in lab manuals such as(Sambrook et al. 2001). The resulting binary vector for soyseed-specific expression of ATPT2 was designated pMON36581 (FIG. 10).Similarly the Synechocystis homogentisate prenyl transferase (slr1736)(SEQ ID NO: 29) was fused to a chloroplast target peptide (CTP1), andcloned into the Arcelin 5 soy seed-specific expression cassette. Theresulting binary plasmid was designated pMON69933 (FIG. 11). Anadditional binary plasmid for seed-specific co-expression of theArabidopsis p-hydroxyphenylpyruvate dioxygenase (HPPD_(At)) and thebifunctional prephenate dehydrogenase from Erwinia herbicola (tyrA_(Eh))(see WO 02/089561) was constructed by fusing the HPPD_(At)-gene and thetyrA_(Eh)-gene to the chloroplast target peptides, CTP2, and CTP1,respectively. These fusion genes were subsequently cloned into the multicloning site of soy seed-specific expression cassettes consisting of thep7Sα′-promoter, a multi cloning site, and the E9 3′-untranslated region.The HPPD_(At) expression cassette was cloned into a binary vectordownstream of the tyrA_(Eh) expression cassette resulting in theformation of pMON69924 (FIG. 12).

A fourth plasmid was constructed by cloning the Arcelin 5-expressioncassette for slr1736 (SEQ ID NO: 29), downstream of the HPPD_(At), andthe tyrA_(Eh) expression cassettes, resulting in the formation ofpMON69943 (FIG. 13).

Each of these binary constructs was transformed into soybean. R1 seedpools from plants harboring these constructs were analyzed fortocopherol content and composition. For constructs pMON36581 andpMON69933, the seed for analysis were chosen at random. Seed from plantstransformed with pMON69924 and pMON69943 showed a segregating darkphenotype. This phenotype has been associated with the presence ofincreased levels of homogentisic acid as a result of the expression oftrans genes HPPD and tyrA. Seed without dark coloration did havewild-type tocopherol levels and were not transgenic. For this reasoncolored seed were chosen for analysis of plants transformed withpMON69924 or pMON69943. For the impact of the HPT expression on totaltocopherol accumulation in a single gene vector, or in a multi genevector, seed from non-transformed soy, or seed transformed withpMON69924 served as controls, respectively. FIG. 14 summarizes thetocopherol data obtained from these experiments. While expression ofATPT2 or slr1736 increased total tocopherol and tocotrienol levels insoy moderately, the impact of HPT expression in the context of a multigene vector was much more pronounced. FIG. 14 demonstrates asignificantly increased level of tocopherol and tocotrienol accumulationfor pMON69943 compared to pMON69924 lines. These data suggest thatcombination of an HPT with tyrA, and HPPD can substantially enhancetocopherol biosynthesis in soy.

Western analysis is carried out to detect the transgene expression intissues harboring the gene of interest (GOI) expression cassette usingthe GOI protein specific antibody. Northern analysis is done fordetecting the mRNA level of the transgene using the GOI sequencespecific radiolabelled probe.

EXAMPLE 7 Identification of Additional Homogentisate Prenyl TransferaseSequences

In an analysis of the non-redundant amino acid database, Motifs II andIV (SEQ ID NOs: 40 and 42 identified in addition to HPT sequences, twogenomic variants of Arabidopsis thaliana sequence related to HPTs (SEQID NOs: 61-62). These sequences are based on insillico prediction fromgenomic sequence by gene prediction algorithms. Further bioinformaticanalysis showed that these sequences encoded an additional homogentisateprenyl transferase related to HPT. Both sequences (SEQ ID NOs: 61-62)were used to search the non-redundant amino acid database. The BLASTsearch results indicated that these sequences are related most to HPTsequences from cyanobacteria (SEQ ID NOs: 1-3) and Arabidopsis (SEQ IDNO: 7).

Alignment of gi15229898 (970 aa)(SEQ ID NO: 61) and gi10998133 (441 aa)(SEQ ID NO: 62) showed that:

-   -   a) C terminal half of gi15229898 (SEQ ID NO: 61) overlaps with        gi10998133 (SEQ ID NO: 62);    -   b) the last 40-50 aa in the C terminal portions of these two        proteins do not align; and    -   c) the N terminal of gi 15229898 does not align also with HPTs        (SEQ ID NOs: 1-7, and 9-11). These findings indicate discrepancy        in the coding sequence prediction reported in Genbank.

In order to verify the predicted sequence, the BAC sequence of theArabidopsis genome corresponding to the region was downloaded fromGenbank (gi|12408742|gb|AC016795.6|ATAC016795, 100835 bp). Codingsequences were predicted from this BAC clone using the FGENESH (SolovyevV.V. (2001) Statistical approaches in Eukaryotic gene prediction: inHandbook of Statistical genetics (eds. Balding D. et al.), John Wiley &Sons, Ltd., p. 83-127) gene prediction program. FGENESH predicted 28proteins from this BAC clone. To identify new homogentisate prenyltransferase proteins among 28 FGENESH predicted proteins, all 28predicted proteins were blasted against the non-redundant amino aciddatabase. FGENESH predicted protein No. 25 (402aa) (SEQ ID NO: 45) wasmost similar to gi10998133 (441 aa) (SEQ ID NO: 62), C terminal half ofgi15229898 (970 aa) (SEQ ID NO: 61) and HPTs (SEQ ID NOs: 1-7, and9-11.)

To provide functional and transcriptional evidence and to confirm thecoding sequence for this gene, plant EST sequences database comprisingproprietary and public sequence was searched. We found several ESTs (SEQID NOs: 63-72) which match the N terminal and C terminal portions ofthis gene. The new gene was named HPT2 (SEQ ID NO: 59) from Arabidopsis.The HPT2 (SEQ ID NO: 57) sequence is quite distinct from HPT1 (SEQ IDNO: 7).

HPT2 (SEQ ID NO: 57) from Arabidopsis is also known as TocopherolSynthase (TS). Present data suggests that the overexpression of TS leadsto a similar increase in the amount of overall tocopherol, over the wildtype, as with HPT1 (SEQ ID NO: 33). However, the enzymes may havedifferent biochemical characteristics because the overexpression of TSresults in less production of the delta tocopherol than theoverexpression of HPT1 (SEQ ID NO: 33).

The presence of chloroplast transit peptide in the HPT2 Arabidopsissequence (SEQ ID NOs: 45 and 57) was verified using ChloroP program(Olof Emanuelsson1, Henrik Nielsen1, 2, and Gunnar von Heijne1ChloroP, aneural network-based method for predicting chloroplast transit peptidesand their cleavage sites. Protein Science: 8: 978-984, 1999).

In addition to SEQ ID NOs: 1, 7, and 9-11(HPT), SEQ ID NOs: 57-58, and90 (HPT2) were added to the alignment, see FIGS. 24-25 and the resultingmotifs analyzed. Motif V (SEQ ID NO: 46), VII (SEQ ID NO: 48), and VIII(SEQ ID NO: 49) are specific to HPT and HPT2 sequences. A HMM search ofthe non-redundant amino acid database using these motifs identified onlycyanobacteria (SEQ ID NOs: 1-3, and 43), photobacteria (SEQ ID NO: 44),and plant HPTs (SEQ ID NOs: 7, and 61-62). Motif VII (SEQ ID NO: 48)identified distantly related ubiA prenyl transerase from bacteria inaddition to homogentisate prenyl transferase. However, the sensitivityof Motif VII to homogentisate prenyl transferase was higher.Homogentisate prenyl transferases had lower e-values by several ordersand higher alignment score (higher than 30). HPT2 sequences are distinctfrom HPT and cyanobacterial HPTs as demonstrated by the sequencedendogram in FIG. 26.

SEQ ID NOs: 43-44 were added to an alignment with SEQ ID NOs: 1-4, 6-7,9-11, 57-58, and 91, see FIGS. 33-34, and the resulting motifs (SEQ IDNOs: 92-95, Motifs IX-XII) were analyzed. Specificity of these motifs tohomogentisate prenyl transferases was confirmed by HMM search. A nonredundant database containing more than 1.34M sequence was searchedusing HMM models built from the alignments shown in FIG. 34 for MotifsIX-XII. E value limits for the search was set at 1.0. All four motifsidentified only homogentisate prenyl transferase from cyanobacteria,photobacteria and Arabidopsis. Upper E values limits for Motif IX, X,XI, and XII were 0.9, 11E10E⁻¹¹, 0.03, 8E10⁻⁸ respectively. The smallsize of motifs resulted in higher E values for Motif IX and XI.

EXAMPLE 8 Transformation and Expression of a Wild Type Arabidopsis HPT2Gene in Sense and Antisense Orientations with Respect to Seed-specificand Constitutive Promoters in Arabidopsis thialiana

The HPT2 full-length cDNA (SEQ ID NO: 59) is excised from an EST clone,CPR230005 (pMON69960-FIG. 15), with SalI and NotI enzymes, blunt-endedand cloned in between the napin promoter and napin 3′ end at blunt-endedSalI site in sense and antisense orientations with respect to the napinpromoter in pMON36525 (FIG. 16) to generate recombinant binary vectorspMON69963 (FIG. 17) and pMON69965 (FIG. 18), respectively. The sequenceof the HPT2 cDNA is confirmed by sequencing with napin 5′-sense(5′-GTGGCTCGGCTTCACTTTTTAC-3′) (SEQ ID NO: 50) and napin 3′-antisense(5′-CCACACTCATATCACCGTGG-3′) (SEQ ID NO: 51) primers using standardsequencing methodology. The HPT2 cDNA used to generate the pMON69963 andpMON69965 is also cloned in between the enhanced 35S promoter and E9-3′end at blunt-ended BglII and BamHI sites of pMON10098 (FIG. 19) togenerate the pMON69964 (FIG. 20) and pMON69966 (FIG. 21) in sense andantisense orientations with respect to the enhanced 35S promoter,respectively. Additional HPT2 internal primers synthesized to completelysequence the whole HPT2 cDNA are listed in the table below:

A list of primers used for confirming the HPT2 cDNA sequence. PrimerDescription Sequence BXK169 HPT2/CPR23005/sense5′-CAGTGCTGGATAGAATTGCCCGGTTCC-3′ (SEQ ID NO: 52) BXK170HPT2/CPR23005/sense 5′-GAGATCTATCAGTGCAGTCTGCTTGG-3′ (SEQ ID NO: 53)BXK171 HPT2/CPR23005/antisense 5′-GGGACAAGCATTTTTATTGCAAG-3′ (SEQ ID NO:54) BXK172 HPT2/CPR23005/antisense 5′-GCCAAGATCACATGTGCAGGAATC-3′ (SEQID NO: 55) BXK173 HPT2/CPR23005/sense 5′-GTGGAGTGCACCTGTGGCGTTCATC-3′(SEQ ID NO: 56)

The plant binary vectors pMON69963, and pMON69965 are used inArabidopsis thaliana plant transformation to direct the sense andantisense expression of the HPT2, in the embryo. The binary vectorspMON69964, and pMON69966 are used in Arabidopsis thaliana planttransformation for sense and antisense expression of the HPT2 in wholeplant. The binary vectors are transformed into ABI strain Agrobacteriumcells by electroporation (Bio-Rad Electroprotocol Manual, Dower et al.,Nucleic Acids Res., 16:6127-6145 (1988)). Transgenic Arabidopsisthaliana plants are obtained by Agrobacterium-mediated transformation asdescribed by Valverkens et al., Proc. Nat. Acad. Sci., 85:5536-5540(1988), Bent et al., Science, 265:1856-1860 (1994), and Bechtold et al.,C.R. Acad. Sci., Life Sciences, 316:1194-1199 (1993). Transgenic plantsare selected by sprinkling the transformed T₁ seeds onto the selectionplates containing MS basal salts (4.3 g/L), Gamborg'a B-5, 500X (2.0g/L), sucrose (10 g/L), MES (0.5 g/L), phytagar (8 g/L), carbenicillin(250 mg/L), cefotaxime (100 mg/L), plant preservation medium (2 ml/L),and kanamycin (60 mg/L) and then vernalizing them at 4° C. in theabsence of light for 2-4 days. The seeds are transferred to 23° C., and16/8 hours light/dark cycle for 5-10 days until seedlings emerge. Onceone set of true leaves are formed on the kanamycin resistant seedlings,they are transferred to soil and grown to maturity. The transgenic linesgenerated through kanamycin selection are grown under two differentlight conditions. One set of the transgenic lines are grown under 16 hrslight and 8 hrs dark and another set of the transgenic lines are grownunder 24 hrs light to study the effect of light on seed tocopherollevels. The T₂ seed harvested from the transformants is analyzed fortocopherol content. The results from the seed total tocopherol analysisfrom lines grown under both normal and high light conditions arepresented in FIGS. 22 and 23. Seed-specific overexpression of HPT2 undernormal and high light conditions produced a significant 1.6- and1.5-fold increase in total tocopherol levels (alpha=0.05; Tukey-KramerHSD) (SAS institute, 2002, JPM version 5.0).

Expression of HPT2 using the constitutive promoter, e35S, produced about20% increase in seed total tocopherol levels as compared to controlunder both light conditions. Maximum tocopherol level reduction in linesharboring the enhanced 35S::HPT2 antisense construct was 20%. Overall,the significant increase in seed total tocopherol level in theArabidopsis thaliana lines harboring the HPT2 driven by the napinpromoter suggests that HPT2 plays a key role in tocopherol biosynthesis.

Western analysis is carried out to detect the transgene expression intissues harboring the gene of interest (GOI) expression cassette usingthe GOI protein specific antibody. Northern analysis is done fordetecting the mRNA level of the transgene using the GOI sequencespecific radiolabelled probe.

EXAMPLE 9 Preparation of Plant Binary Vector for Expression of HPT2 fromArabidopsis in Combination with Tocopherol Pathway Genes

To investigate the combinatorial effect of HPT2 with other key enzymesin the pathway, a plant binary vector containing seed-specificallyexpressed hydroxyphenylpyruvate dioxygenase (HPPD), bifunctionalprephenate dehydrogenase tyrA, and HPT2 (pMON81028-FIG. 27) is prepared.The pMON81028 is made by exercising the pNapin::HPT2::Napin 3′expression cassette from pMON81023 (FIG. 28) with Bsp120I and NotIenzymes and ligating it to pMON36596 (FIG. 29) at NotI site. ThepMON36596 contains the pNapin::CTP2::HPPD::Napin 3′ andpNapin::CTP1::TyrA::Napin 3′ expression cassettes. The pMON81028 istransformed into Arabidopsis thaliana plant using the method describedin Example 8.

EXAMPLE 10 Preparation of Construct for Bacterial Expression of HPT2from Arabidopsis

The EST clone CPR23005 containing the HPT2 full length cDNA is used as atemplate for PCR to amplify the HPT2 cDNA fragment codes for the matureform of the HPT2 protein. Two sets of PCR products are generated toclone at the pET30a(+) vector (Novagen, Inc.) (FIG. 30) to produce HPT2protein with and without his tag. The primer set BXK174(5′-CACATATGGCATGTTCTCAGGTTGGTGCTGC-3′) (SEQ ID NO: 84) and BXK176(5′-GCGTCGACCTAGAGGAAGGGGAATAACAG-3′) (SEQ ID NO: 85) is used forcloning HPT2 at the NdeI and SalI sites of pET30a(+), behind the T7promoter to generate mature HPT2 protein without the his tag. Theresulting recobmbinant vector is named pMON69993 (FIG. 31). The primerset BXK175 (5′-CAACCATGGCATGTTCTCAGGTTGGTGCTGC-3′) (SEQ ID NO: 86) andBXK176 (5′-GCGTCGACCTAGAGGAAGGGGAATAACAG-3′) (SEQ ID NO: 87) is used togenerate HPT2 PCR product to clone at the NcoI and SalI sites ofpET30a(+) to produce mature HPT2 with his tag. The recombinant vector isnamed as pMON69992 (FIG. 32). The pMON69993 and pMON69992 is used forproducing bacterial expressed HPT2 to carry out enzyme assays to confirmits homogentisate prenyl transferase activity and specificity towardsgeranylgeranyl pyrophosphosphate, phytyl pyrophophaste and solanylpyrophosphate substrates.

1. (canceled)
 2. A substantially purified polypeptide moleculecomprising an amino acid sequence selected from the group consisting ofSEQ ID NOs: 5, 9-11, 57-58, and
 90. 3-29. (canceled)
 30. A substantiallypurified polypeptide comprising an amino acid sequence selected from thegroup consisting of SEQ ID NOs: 39-42, 46-49, and 92-95, wherein saidamino acid sequence is not derived from a nucleic acid molecule that isderived from Nostoc punctiforme, Anabaena, Synechocystis, Zea mays,Glycine max, Arabidopsis thaliana, Oryza sativa, Trichodesmiumerythraeum, Chloroflexus aurantiacus, wheat, leek, canola, cotton, ortomato.
 31. The polypeptide of claim 30, wherein more than one aminoacid sequence is selected from the group consisting of SEQ ID NOs:39-42, 46-49, and 92-95.
 32. The polypeptide of claim 31, wherein saidamino acid is not derived from a nucleic acid that is derived fromSulfolobus, Aeropyrum, or sorghum. 33-38. (canceled)
 39. A substantiallypurified polypeptide comprising an amino acid sequence selected from thegroup consisting of SEQ ID NOs: 39-42, 46-49, and 92-95, wherein saidamino acid sequence does not comprise any of the amino acid sequencesset forth in sequence listings in WO 00/68393, WO 00/63391, WO 01/62781,or WO 02/33060, and does not comprise SEQ ID NOs: 1-11, 43-45, 57-58,61-62, and 90 from the present application.
 40. The polypeptide of claim39, wherein more than one amino acid sequence is selected from the groupconsisting of SEQ ID NOs: 39-42, 46-49, and 92-95. 41-44. (canceled) 45.A substantially purified polypeptide with homogentisate prenyltransferase activity comprising an amino acid sequence selected from thegroup consisting of SEQ ID NOs: 43-44. 46-49. (canceled) 50.Homogentisate prenyl transferase polypeptide sequences identified usingany of the alignments set forth in the group consisting of FIGS. 2 a-2c, 3 a-3 c, 24 a-24 b, 25 a-25 b, 33 a-33 c, 34 a-34 b, and 35 a-35 b ina profile based model, excluding the amino acid sequences set forth insequence listings in WO 00/68393, WO 00/63391, WO 01/62781, or WO02/33060 and do not comprise SEQ ID NOs: 1-11, 43-45, 57-58, 61-62, and91 from the present application.
 51. Homogentisate prenyl transferasepolypeptide sequences identified using any of the alignments set forthin FIGS. 2 a-2 c, 3 a-3 c, 24 a-24 b, 25 a-25 b, 33 a-33 c, 34 a-34 b,and 35 a-35 b in a profile based model, wherein said amino acid sequenceis not derived from a nucleic acid molecule that is derived from Nostocpunctiforme, Anabaena, Synechocystis, Zea mays, Glycine max, Arabidopsisthaliana, Oryza sativa, wheat, leek, canola, cotton, or tomato.
 52. Thehomogentisate prenyl transferase of claim 50, wherein the profile basedmodel is an HMM model.
 53. The homogentisate prenyl transferase of claim52, wherein the profile based sequence search method is a HMM modelgenerated using HMMER package version 2.2 g with default parameters.